巴西专利BR112018006271B1 method and apparatus for decoding a predicted image

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
Modalities of the present invention provide an image prediction method and apparatus. The image prediction method includes: determining (S410), according to information about image units adjacent to an image unit to be processed, whether a set of candidate prediction modes for the image unit to be processed includes a related fusion mode, where the related fusion mode indicates that respective predicted images of the image unit to be processed and the adjacent image units of the image unit to be processed are obtained when using the same related model; parsing (S420) a bit stream to obtain first indication information; determining (S430), in the set of candidate prediction modes, a prediction mode for the image unit to be processed according to the first indication information; and determining (S440) a predicted image of the image unit to be processed according to the prediction mode. The method reduces a bit rate of encoding a prediction mode, thereby improving encoding efficiency.
公开号:BR112018006271B1
申请号:R112018006271-5
申请日:2016-09-08
公开日:2021-01-26
发明作者:Huanbang Chen；Sixin Lin；Hong Zhang
申请人:Huawei Technologies Co., Ltd.；
IPC主号:

专利说明:

TECHNICAL FIELD
[001] The present invention concerns the field of video encoding and compression, and in particular an image prediction method and apparatus. BACKGROUND
[002] Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptops or desktop computers, tablets, video players electronic books, digital cameras, digital recording devices, digital media players, video game devices, video game consoles, cell phones or satellite radio, videoconferencing devices, video streaming devices and more. Digital video devices implement video compression technologies, such as video compression technologies described in standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264 / MPEG-4, Part 10: the Advanced Video Coding (AVC) standard, and ITU-T H.265: the High Efficiency Video Coding (HEVC) standard, and described in extensions of such standards, to transmit and receive digital video information more efficiently. By implementing such video encoding technologies, a video device can transmit, receive, encode, decode and / or store digital video information more efficiently.
[003] Video compression technologies include spatial prediction (within image) and / or temporal prediction (between images), to reduce or remove redundancy inherent in video sequences. For block-based video encoding, a video slice (that is, a video frame or a part of a video frame) can be partitioned into multiple video blocks. The video block can also be referred to as a tree block, an encoding unit (CU) and / or an encoding node. The video blocks in an intra-coded slice (I) of an image are encoded by means of spatial prediction in relation to a reference sample in adjacent blocks in the same image. Video blocks in an intercoded slice (P or B) of an image can use spatial prediction in relation to a reference sample in adjacent blocks in the same image, or temporal prediction in relation to a reference sample in another reference image . An image can be referred to as a frame, and a reference image can be referred to as a frame of reference.
[004] Spatial or temporal prediction results in a predictive block for a block to be coded. Residual data indicates a pixel difference between the original block to be encoded and the predictive block. An intercoded block is encoded according to a motion vector that points to a block of reference samples forming the predictive block, and the residual data indicating the difference between the block to be encoded and the predictive block. An intra-encoded block is encoded according to an intra-encoding mode and residual data. For additional compression, residual data can be transformed from a pixel domain to a transform domain, thereby generating residual transform coefficients. The residual transform coefficients can then be quantized. The quantized transform coefficients that are initially arranged in a two-dimensional array can be scanned sequentially to generate a one-dimensional vector of the transform coefficients, and entropy coding can be applied to achieve even more compression. SUMMARY
[005] The present invention describes an image prediction method with improved coding efficiency. According to prediction information or unit sizes of adjacent imaging units of an image unit to be processed or a set of candidate prediction modes marking an area level, a prediction mode for the image unit to be processed is derivative. Because earlier information is provided for encoding the prediction mode, a bit rate of encoding the prediction mode is reduced, thereby improving encoding efficiency.
[006] In accordance with the technology of the present invention, a method for decoding a predicted image includes: determining, according to information about image units adjacent to an image unit to be processed, whether a set of prediction modes Candidates for the image unit to be processed include an affine fusion mode, where the affine fusion mode indicates that respective predicted images of the image unit to be processed and the adjacent image units of the image unit to be processed are obtained by use the same related model; parsing a bit stream to obtain first indication information; determine, in the set of candidate prediction modes, a prediction mode for the image unit to be processed according to the first indication information; and determining a predicted image of the image unit to be processed according to the prediction mode.
[007] The adjacent imaging units of the imaging unit to be processed include at least adjacent imaging units at the top, left, top right, bottom left and top left of the imaging unit to be processed.
[008] In accordance with the technology of the present invention, determine, according to information regarding imaging units adjacent to an imaging unit to be processed, whether a set of candidate prediction modes for the imaging unit to be processed includes a similar fusion mode includes the following implementations:
[009] A first implementation includes: when a prediction mode for at least one of the units of adjacent images is to obtain a predicted image using a related model, parse the bit stream to obtain second indication information, where when the second indication information is 1 the set of candidate prediction modes includes the related fusion mode; or when the second indication information is 0, the set of candidate prediction modes does not include the related fusion mode; otherwise, the set of candidate prediction modes does not include the related fusion mode.
[010] A second implementation includes: when a prediction mode for at least one of the adjacent imaging units is to obtain a predicted image using a related model, the set of candidate prediction modes includes the related fusion mode; otherwise, the set of candidate prediction modes does not include the related fusion mode.
[011] A third implementation includes: the information from the adjacent imaging units are prediction modes of the adjacent imaging units, the prediction modes include at least one first affine mode in which a predicted image is obtained when using a first affine model or a second affine mode in which a predicted image is obtained by using a second affine model, and correspondingly the affine fusion mode includes at least one first affine fusion mode that merges the first affine mode or a second affine fusion mode that fuses the second related mode; and correspondingly determine, according to information regarding imaging units adjacent to an imaging unit to be processed, whether a set of candidate prediction modes for the imaging unit to be processed includes a related fusion mode includes: when the first affine mode ranks first in quantity among the prediction modes for adjacent prediction units, the set of candidate prediction modes includes the first affine fusion mode and does not include the second affine fusion mode; when the second affine mode is ranked first in quantity among the prediction modes for the adjacent prediction units, the set of candidate prediction modes includes the second affine fusion mode and does not include the first affine fusion mode; or when an unrelated mode ranks first in quantity among the prediction modes for the adjacent prediction units, the set of candidate prediction modes does not include the related fusion mode.
[012] The third implementation additionally includes: when the first affine mode is ranked first in quantity among the prediction modes for the adjacent prediction units, the set of candidate prediction modes includes the first affine fusion mode and does not include the second related fusion mode; when the second affine mode is ranked first in quantity among the prediction modes for the adjacent prediction units, the set of candidate prediction modes includes the second affine fusion mode and does not include the first affine fusion mode; when an unrelated mode is ranked first in quantity and the first affine mode is ranked second in quantity among the prediction modes for adjacent prediction units, the set of candidate prediction modes includes the first affine and non-fusion mode includes the second related fusion mode; or when one unrelated mode is ranked first in quantity and the second related mode is ranked second in quantity among the prediction modes for adjacent prediction units, the set of candidate prediction modes includes the second affine fusion mode and does not include the first related fusion mode.
[013] A fourth implementation includes: when a prediction mode for at least one of the adjacent imaging units is to obtain a predicted image using a related model, and a width and height of at least one of the adjacent imaging units are respectively smaller than a width and height of the image unit to be processed, parse the bit stream to obtain third indication information, where when the third indication information is 1 the set of candidate prediction modes includes the related fusion mode , or when the third indication information is 0, the set of candidate prediction modes does not include the related fusion mode; otherwise, the set of candidate prediction modes does not include the related fusion mode.
[014] A fifth implementation includes: when a prediction mode for at least one of the adjacent imaging units is to obtain a predicted image using a related model, and a width and height of at least one of the adjacent imaging units are respectively less than a width and height of the image unit to be processed, the set of candidate prediction modes includes the related fusion mode; otherwise, the set of candidate prediction modes does not include the related fusion mode.
[015] According to the technology of the present invention, a method for decoding a predicted image includes: parsing a bit stream to obtain first indication information; determine a set of candidate modes for a first image area to be processed according to the first indication information, where, when the first indication information is 0, a set of candidate translational modes is used as the set of candidate modes for the first image area to be processed, where the translational mode indicates a prediction mode in which a predicted image is obtained when using a translational model; or, when the first indication information is 1, a set of candidate translational modes and a set of candidate related modes are used as the set of candidate modes for the first image area to be processed, where the related mode indicates an image mode. prediction in which a predicted image is obtained when using a related model; parsing the bit stream to obtain second indication information; determine, in a set of candidate prediction modes for the first image area to be processed, a prediction mode for an image unit to be processed according to the second indication information, where the image unit to be processed belongs to the first image area to be processed; and determining a predicted image of the image unit to be processed according to the prediction mode.
[016] The first image area to be processed includes one of a group of image frames, an image frame, a set of image tiles, a set of image slices, an image tile, an image slice, a set of image encoding units or an image encoding unit.
[017] In one example, a method for decoding a predicted image includes: determining, according to information about imaging units adjacent to an imaging unit to be processed, whether a set of candidate prediction modes for the imaging unit image to be processed includes an affine fusion mode, where the affine fusion mode indicates that respective predicted images of the image unit to be processed and the adjacent image units of the image unit to be processed are obtained when using the same affine model ; parsing a bit stream to obtain first indication information; determine, in the set of candidate prediction modes, a prediction mode for the image unit to be processed according to the first indication information; and determining a predicted image of the image unit to be processed according to the prediction mode.
[018] In another example, a method for encoding a predicted image includes: determining, according to information about imaging units adjacent to an imaging unit to be processed, whether a set of candidate prediction modes for the unit image to be processed includes an affine fusion mode, where the affine fusion mode indicates that respective predicted images of the image unit to be processed and the adjacent image units of the image unit to be processed are obtained when using the same model to; determine, in the set of candidate prediction modes, a prediction mode for the image unit to be processed; determining a predicted image of the image unit to be processed according to the prediction mode; and encoding the first indication information in a bit stream, where the first indication information indicates the prediction mode.
[019] In another example, a method for decoding a predicted image includes: parsing a bit stream to obtain first indication information; determine a set of candidate modes for a first image area to be processed according to the first indication information, where, when the first indication information is 0, a set of candidate translational modes is used as the set of candidate modes for the first image area to be processed, where the translational mode indicates a prediction mode in which a predicted image is obtained when using a translational model; or, when the first indication information is 1, a set of candidate translational modes and a set of candidate related modes are used as the set of candidate modes for the first image area to be processed, where the related mode indicates an image mode. prediction in which a predicted image is obtained when using a related model; parsing the bit stream to obtain second indication information; determine, in a set of candidate prediction modes for the first image area to be processed, a prediction mode for an image unit to be processed according to the second indication information, where the image unit to be processed belongs to the first image area to be processed; and determining a predicted image of the image unit to be processed according to the prediction mode.
[020] In another example, a method for encoding a predicted image includes: when a set of candidate translational modes is used as a set of candidate modes for a first image area to be processed, set the first indication information to 0 and encode the first indication information in a bit stream, where the translational mode indicates a prediction mode in which a predicted image is obtained when using a translational model; or when a set of candidate translational modes and a set of related candidate modes are used as a set of candidate modes for a first image area to be processed, set the first indication information to 1 and encode the first indication information in a stream bits, where the affine mode indicates a prediction mode in which a predicted image is obtained when using an affine model; determine, in a set of candidate prediction modes for the first image area to be processed, a prediction mode for an image unit to be processed, where the image unit to be processed belongs to the first image area to be processed ; determining a predicted image of the image unit to be processed according to the prediction mode; and encoding second indication information in the bit stream, where the second indication information indicates the prediction mode.
[021] In another example, a device for decoding a predicted image includes: a first determination module, configured to determine, according to information about image units adjacent to an image unit to be processed, if a set of candidate prediction modes for the image unit to be processed includes an affine fusion mode, where the affine fusion mode indicates that respective predicted images of the image unit to be processed and the adjacent image units of the image unit to be processed processed are obtained when using the same related model; a parsing module, configured to parse a bit stream to obtain first indication information; a second determination module, configured to determine, in the set of candidate prediction modes, a prediction mode for the image unit to be processed according to the first indication information; and a third determination module, configured to determine a predicted image of the image unit to be processed according to the prediction mode.
[022] In another example, an apparatus for encoding a predicted image includes: a first determination module, configured to determine, according to information about imaging units adjacent to an imaging unit to be processed, whether a set of candidate prediction modes for the image unit to be processed includes an affine fusion mode, where the affine fusion mode indicates that respective predicted images of the image unit to be processed and the adjacent image units of the image unit to be processed processed are obtained when using the same related model; a second determination module, configured to determine, in the set of candidate prediction modes, a prediction mode for the image unit to be processed; a third determination module, configured to determine a predicted image of the image unit to be processed according to the prediction mode; and a coding module, configured to encode first indication information in a bit stream, where the first indication information indicates the prediction mode.
[023] In another example, an apparatus for decoding a predicted image includes: a first parsing module, configured to parse a bit stream to obtain first indication information; a first determination module, configured to determine a set of candidate modes for a first image area to be processed according to the first indication information, where, when the first indication information is 0, a set of candidate translational modes is used as the set of candidate modes for the first image area to be processed, where the translational mode indicates a prediction mode in which a predicted image is obtained when using a translational model; or, when the first indication information is 1, a set of candidate translational modes and a set of candidate related modes are used as the set of candidate modes for the first image area to be processed, where the related mode indicates an image mode. prediction in which a predicted image is obtained when using a related model; a second parsing module, configured to parse the bit stream to obtain second indication information; a second determination module, configured to determine, in a set of candidate prediction modes for the first image area to be processed, a prediction mode for an image unit to be processed according to the second indication information, where the image unit to be processed belongs to the first image area to be processed; and a third determination module, configured to determine a predicted image of the image unit to be processed according to the prediction mode.
[024] In another example, an apparatus for encoding a predicted image includes: a first encoding module, configured for: when a set of candidate translational modes is used as a set of candidate modes for a first image area to be processed , set the first indication information to 0 and encode the first indication information in a bit stream, where the translational mode indicates a prediction mode in which a predicted image is obtained when using a translational model; or when a set of candidate translational modes and a set of related candidate modes are used as a set of candidate modes for a first image area to be processed, set the first indication information to 1 and encode the first indication information in a stream bits, where the affine mode indicates a prediction mode in which a predicted image is obtained when using an affine model; a first determination module, configured to determine, in a set of candidate prediction modes for the first image area to be processed, a prediction mode for an image unit to be processed, where the image unit to be processed belongs to the first image area to be processed; a second determination module, configured to determine a predicted image of the image unit to be processed according to the prediction mode; and a second encoding module, configured to encode second indication information in the bit stream, where the second indication information indicates the prediction mode.
[025] In another example, a device for decoding video data is provided. The device includes a video decoder that is configured to perform the following operations: determine, according to information about imaging units adjacent to an imaging unit to be processed, whether a set of candidate prediction modes for the imaging unit image to be processed includes an affine fusion mode, where the affine fusion mode indicates that respective predicted images of the image unit to be processed and the adjacent image units of the image unit to be processed are obtained when using the same affine model ; parsing a bit stream to obtain first indication information; determine, in the set of candidate prediction modes, a prediction mode for the image unit to be processed according to the first indication information; and determining a predicted image of the image unit to be processed according to the prediction mode.
[026] In another example, a device for encoding video data is provided. The device includes a video encoder that is configured to perform the following operations: determine, according to information about imaging units adjacent to an imaging unit to be processed, whether a set of candidate prediction modes for the imaging unit image to be processed includes an affine fusion mode, where the affine fusion mode indicates that respective predicted images of the image unit to be processed and the adjacent image units of the image unit to be processed are obtained when using the same affine model ; determine, in the set of candidate prediction modes, a prediction mode for the image unit to be processed; determining a predicted image of the image unit to be processed according to the prediction mode; and encoding the first indication information in a bit stream, where the first indication information indicates the prediction mode.
[027] In another example, a device for decoding video data is provided. The device includes a video decoder that is configured to perform the following operations: parse a bit stream to obtain first indication information; determine a set of candidate modes for a first image area to be processed according to the first indication information, where, when the first indication information is 0, a set of candidate translational modes is used as the set of candidate modes for the first image area to be processed, where the translational mode indicates a prediction mode in which a predicted image is obtained when using a translational model; or, when the first indication information is 1, a set of candidate translational modes and a set of candidate related modes are used as the set of candidate modes for the first image area to be processed, where the related mode indicates an image mode. prediction in which a predicted image is obtained when using a related model; parsing the bit stream to obtain second indication information; determine, in a set of candidate prediction modes for the first image area to be processed, a prediction mode for an image unit to be processed according to the second indication information, where the image unit to be processed belongs to the first image area to be processed; and determining a predicted image of the image unit to be processed according to the prediction mode.
[028] In another example, a device for encoding video data is provided. The device includes a video encoder that is configured to perform the following operations: when a set of candidate translational modes is used as a set of candidate modes for a first image area to be processed, set the first indication information to 0 and encode the first indication information in a bit stream, where the translational mode indicates a prediction mode in which a predicted image is obtained when using a translational model; or when a set of candidate translational modes and a set of related candidate modes are used as a set of candidate modes for a first image area to be processed, set the first indication information to 1 and encode the first indication information in a stream bits, where the affine mode indicates a prediction mode in which a predicted image is obtained when using an affine model; determine, in a set of candidate prediction modes for the first image area to be processed, a prediction mode for an image unit to be processed, where the image unit to be processed belongs to the first image area to be processed ; determining a predicted image of the image unit to be processed according to the prediction mode; and encoding second indication information in the bit stream, where the second indication information indicates the prediction mode.
[029] In another example, a computer-readable storage medium that stores an instruction is provided. When executed, the instruction induces one or more processors from a device to decode video data to perform the following operations: determine, according to information about imaging units adjacent to an imaging unit to be processed, whether a set of candidate prediction modes for the image unit to be processed includes an affine fusion mode, where the affine fusion mode indicates that respective predicted images of the image unit to be processed and the adjacent image units of the image unit to be processed processed are obtained when using the same related model; parsing a bit stream to obtain first indication information; determine, in the set of candidate prediction modes, a prediction mode for the image unit to be processed according to the first indication information; and determining a predicted image of the image unit to be processed according to the prediction mode.
[030] In another example, a computer-readable storage medium that stores an instruction is provided. When executed, the instruction induces one or more processors from a device to encode video data to perform the following operations: determine, according to information about imaging units adjacent to an imaging unit to be processed, whether a set of candidate prediction modes for the image unit to be processed includes an affine fusion mode, where the affine fusion mode indicates that respective predicted images of the image unit to be processed and the adjacent image units of the image unit to be processed processed are obtained when using the same related model; determine, in the set of candidate prediction modes, a prediction mode for the image unit to be processed; determining a predicted image of the image unit to be processed according to the prediction mode; and encoding the first indication information in a bit stream, where the first indication information indicates the prediction mode.
[031] In another example, a computer-readable storage medium that stores an instruction is provided. When executed, the instruction induces one or more processors from a device to decode video data to perform the following operations: parse a bit stream to obtain first indication information; determine a set of candidate modes for a first image area to be processed according to the first indication information, where, when the first indication information is 0, a set of candidate translational modes is used as the set of candidate modes for the first image area to be processed, where the translational mode indicates a prediction mode in which a predicted image is obtained when using a translational model; or, when the first indication information is 1, a set of candidate translational modes and a set of candidate related modes are used as the set of candidate modes for the first image area to be processed, where the related mode indicates an image mode. prediction in which a predicted image is obtained when using a related model; parsing the bit stream to obtain second indication information; determine, in a set of candidate prediction modes for the first image area to be processed, a prediction mode for an image unit to be processed according to the second indication information, where the image unit to be processed belongs to the first image area to be processed; and determining a predicted image of the image unit to be processed according to the prediction mode.
[032] In another example, a computer-readable storage medium that stores an instruction is provided. When executed, the instruction induces one or more processors from a device to encode video data to perform the following operations: when a set of candidate translational modes is used as a set of candidate modes for a first image area to be processed, set the first indication information to 0 and encode the first indication information in a bit stream, where the translational mode indicates a prediction mode in which a predicted image is obtained when using a translational model; or when a set of candidate translational modes and a set of related candidate modes are used as a set of candidate modes for a first image area to be processed, set the first indication information to 1 and encode the first indication information in a stream bits, where the affine mode indicates a prediction mode in which a predicted image is obtained when using an affine model; determine, in a set of candidate prediction modes for the first image area to be processed, a prediction mode for an image unit to be processed, where the image unit to be processed belongs to the first image area to be processed ; determining a predicted image of the image unit to be processed according to the prediction mode; and encoding second indication information in the bit stream, where the second indication information indicates the prediction mode. BRIEF DESCRIPTION OF THE DRAWINGS
[033] To describe the technical solutions in the modalities of the present invention more clearly, the following briefly describes the attached drawings required to describe the modalities or the prior art. Of course, the accompanying drawings in the description below show merely some embodiments of the present invention, and a person of ordinary skill in the art can still derive other designs from these attached drawings without creative efforts.
[034] Figure 1 is a schematic block diagram of a video encoding system according to an embodiment of the present invention.
[035] Figure 2 is a schematic block diagram of a video encoder according to an embodiment of the present invention.
[036] Figure 3 is a schematic flow chart illustrating an example operation of a video encoder according to an embodiment of the present invention.
[037] Figure 4 is a schematic diagram of positions of a block to be processed and of reconstructed blocks adjacent to the block to be processed according to an embodiment of the present invention.
[038] Figure 5 is a schematic block diagram of another video encoder according to an embodiment of the present invention.
[039] Figure 6 is a schematic flow chart illustrating another example operation of a video encoder according to an embodiment of the present invention.
[040] Figure 7 is a schematic block diagram of yet another video encoder according to an embodiment of the present invention.
[041] Figure 8 is a schematic block diagram of a video decoder according to an embodiment of the present invention.
[042] Figure 9 is a schematic flow chart illustrating an example operation of a video decoder according to an embodiment of the present invention.
[043] Figure 10 is a schematic block diagram of another video decoder according to an embodiment of the present invention.
[044] Figure 11 is a schematic flowchart illustrating another example operation of a video decoder according to an embodiment of the present invention.
[045] Figure 12 is a schematic block diagram of yet another video decoder according to an embodiment of the present invention. DESCRIPTION OF MODALITIES
[046] The following describes clearly and completely the technical solutions in the modalities of the present invention with reference to the accompanying drawings in the modalities of the present invention. Of course, the embodiments described are some, but not all, of the present invention. All other modalities obtained by a person of ordinary skill in the art based on the modalities of the present invention without creative efforts should be included in the scope of protection of the present invention.
[047] Motion compensation is one of the main technologies in video encoding to improve compression efficiency. Conventional motion compensation based on block matching is a widely applied method for mainstream video encoders, and especially in video encoding standards. In the motion compensation method based on block matching, a block obtained with inter prediction uses a translational motion model, and the translational motion model assumes that motion vectors in all pixel positions of a block are the same. However, this assumption is not valid in many cases. Indeed, a movement of an object in a video is usually a complex combination of movements such as translation, rotation and zoom. If a block of pixels includes these complex movements, a predicted signal that is obtained when using the conventional motion compensation method based on the matching of blocks is not accurate. Consequently, interframe correlation cannot be removed completely. To solve the problem, a high-order motion model is introduced for motion compensation in the video encoding. The high-order motion model has more freedom than the translational motion model, and allows pixels in a block obtained with inter prediction to have different motion vectors. That is, a motion vector field generated using the high-order motion model is more accurate.
[048] A related motion model described based on a control point is a representative type of high order motion model. Unlike the conventional translational motion model, a value of a motion vector for each pixel point in a block is related to a position of the pixel point, and is a first order linear equation of a coordinate position. The related motion model allows a deformation transform such as rotation or zoom of a reference block, and a more accurate predicted block can be obtained by means of motion compensation.
[049] The type of inter-prediction previously exposed in which a predicted block is obtained when using the affine motion model by means of motion compensation in general is referred to as an affine mode. In current mainstream video compression encoding standards, the inter prediction type includes two modes: an advanced motion vector prediction mode (Advanced Motion Vector Prediction, AMVP) and a fusion mode (Fusion). In AMVP, for each coding block, a prediction direction, a frame index and a difference between a real motion vector and a predicted motion vector need to be transferred explicitly. However, in fusion mode, motion information from a current coding block is derived directly from a motion vector in an adjacent block. The affine mode and the inter prediction mode such as AMVP or Fusion that is based on the translational motion model can be combined to form a new inter prediction mode such as AMVP or Fusion that is based on the affine movement model. For example, a fusion mode based on the affine motion model can be referred to as an affine fusion mode (Affine Fusion). In a process of selecting a prediction mode, new prediction modes and prediction modes in current standards participate in a “performance / cost ratio” comparison process together, to select an ideal mode as the prediction mode and generate a predicted image of a block to be processed. In general, a prediction mode selection result is encoded and an encoded prediction mode selection result is transmitted to a decoding side.
[050] The affine mode can improve the accuracy of a predicted block and improve coding efficiency. However, on the other hand, for the affine mode, more bit rates need to be consumed to encode motion information from control points than are needed for uniform motion information based on the translational motion model. Furthermore, because of the increase in candidate prediction modes, bit rates used to encode a prediction mode selection result also increase. Such additional bit rate consumption affects any improvement in coding efficiency.
[051] According to the technical solutions of the present invention, on the one hand, whether a set of candidate prediction modes for an imaging unit to be processed includes a related fusion mode is determined according to prediction mode information or size information of adjacent imaging units of the imaging unit to be processed; a bit stream is parsed for indication information; a prediction mode for the image unit to be processed is determined in the set of candidate prediction modes according to the indication information; and a predicted image of the image unit to be processed is determined according to the prediction mode. On the other hand, a bit stream is parsed to obtain indication information; whether a particular area uses a set of candidate prediction modes including an affine mode is determined when using the referral information; a prediction mode is determined according to the set of candidate prediction modes and other indication information received; and a predicted image is generated.
[052] Therefore, the prediction mode information or size information of the adjacent image units of the image unit to be processed can be used as prior knowledge to encode prediction information of the image unit to be processed. Indication information including the set of candidate prediction modes in the area can also be used as background knowledge to encode prediction information from the image unit to be processed. Prior knowledge instructs coding of the prediction mode, reducing a bit rate of coding mode selection information, thereby improving the coding efficiency.
[053] Furthermore, there are multiple solutions to improve efficiency when encoding motion information of a related model such as, for example, patent applications CN201010247275.7, CN201410584175.1, CN201410526608.8, CN201510085362.X, PCT / CN2015 / 073969, CN201510249484.8, CN201510391765.7 and CN201510543542.8, which are incorporated in this document in their entirety by reference. It should be understood that, because specific technical problems solved are different, the technical solutions of the present invention can be applied to the solutions indicated above, to further improve the coding efficiency.
[054] It must be further understood that the affine model is a general term for non-translational movement models. Actual movements including rotation, zoom, deformation, perspective and more can all be used for motion estimation and motion compensation in inter prediction when establishing different motion models, and are referred to separately as a first related model, a second related model and others more for short.
[055] Figure 1 is a schematic block diagram of a video encoding system 10 according to an embodiment of the present invention. As described in this specification, the term “video encoder” generically refers to both a video encoder and a video decoder. In the present invention, the terms "video encoding" and "encoding" can generally refer to video encoding or video decoding.
[056] As shown in figure 1, the video encoding system 10 includes a source device 12 and a destination device 14. The source device 12 generates encoded video data. Therefore, the source device 12 can be referred to as a video encoding device or a video encoding device. The destination device 14 can decode the encoded video data generated by the source device 12. Therefore, the destination device 14 can be referred to as a video decoding device or a video decoding device. The source device 12 and the target device 14 can be examples of video encoding devices or video encoding devices. The source device 12 and the target device 14 can include a wide range of devices, including desktop computers, mobile computing devices, notebooks (for example, laptops), tablets, signal converting devices, telephone devices such as so-called “smart” phones, televisions, cameras, display devices, digital media players, computer game consoles, computers in vehicles or the like.
[057] The target device 14 can receive the encoded video data from the source device 12 when using a channel 16. Channel 16 can include one or more media and / or devices that can move the encoded video data from the source device. source 12 to the destination device 14. In one example, channel 16 may include one or more communication media that allows the source device 12 to directly transmit the encoded video data to the destination device 14 in real time. In this example, the source device 12 can modulate the encoded video data according to a communication standard (for example, a wireless communication protocol), and can transmit the modulated video data to the destination device 14. A one or more communication media may include wireless and / or wired communication media such as, for example, a radio frequency (RF) spectrum or one or more physical transmission lines. The one or more communication media can form a part of a packet-based network (for example, a local area network, an extended area network or a global network (such as the Internet)). One or more communication media can include a router, switch, base station or other device that facilitates communication from the source device 12 to the destination device 14.
[058] In another example, channel 16 can include a storage medium that stores the encoded video data generated by the source device 12. In this example, the destination device 14 can access the storage medium through access to disk or card access. The storage media can include a variety of locally accessed data storage media such as a Blue-ray disc, DVD, CD-ROM, flash memory, or other digital storage media suitable for storing encoded video data .
[059] In another example, channel 16 may include a file server or another intermediate storage device that stores the encoded video data generated by the source device 12. In this example, the destination device 14 can access, for example, means of streaming or transferring, the encoded video data that is stored on the file server or another intermediate storage device. The file server can be a type of server that can store the encoded video data and transmit the encoded video data to the destination device 14. Examples of the file server include a network server (for example, to a website). network), a file transfer protocol (FTP) server, a network-connected storage device (NAS), and a local disk drive.
[060] The target device 14 can access the encoded video data via a standard data connection (such as an Internet connection). An example type of data connection can include a wireless channel (for example, a Wi-Fi connection), a wired connection (for example, a DSL or cable modem), or a combination of both that is suitable to access the encoded video data stored on the file server. The transmission of the encoded video data from the file server can be streaming, transferring, or a combination of both.
[061] The technologies of the present invention are not limited to wireless applications or configurations. The technologies can be applied to video encoding as support in a variety of multimedia applications such as, for example, broadcast television over the air, cable television broadcast, satellite television broadcast, streaming video transmission ( for example, via the Internet), encoding video data stored on data storage media, decoding video data stored on data storage media, or another application. In some instances, the video encoding system 10 can be configured to support unidirectional or bidirectional video transmission, in order to support applications such as video streaming, video playback, video broadcasting and / or video telephony.
[062] In the example in figure 1, the source device 12 includes a video source 18, a video encoder 20 and an output interface 22. In some examples, the output interface 22 may include a modulator / demodulator (a modem) and / or a transmitter. Video source 18 may include a video capture device (such as a video camera), a video file containing previously captured video data, a video delivery interface for receiving video data from a video content provider. videos and / or a computer graphics system to generate video data, or a combination of such video data sources.
[063] Video encoder 20 can encode video data from video source 18. In some instances, source device 12 directly transmits encoded video data to destination device 14 when using output interface 22. Alternatively , the encoded video data can be stored on the storage media or on the file server for later access by the destination device 14 for decoding and / or playback.
[064] In the example in Figure 1, the target device 14 includes an input interface 28, a video decoder 30 and a display device 32. In some examples, the input interface 28 includes a receiver and / or a modem . The input interface 28 can receive the encoded video data when using channel 16. The display device 32 can be integrated with the destination device 14, or it can be outside the destination device 14. In general, the video device display 32 displays encoded video data. The display device 32 can include a variety of display devices such as, for example, a liquid crystal display (LCD), a plasma display, an organic light-emitting diode (OLED) display or another type of device display.
[065] Video encoder 20 and video decoder 30 can operate according to a video compression standard (such as the High Efficiency Video Encoding standard (H.265)), and can comply with the HEVC Test Model (HM). The ITU-T H.265 (V3) (04/2015) text description of the H.265 standard was released on April 29, 2015, and can be downloaded from http://handle.itu.int/11.1002/1000 / 12455. The entire contents of the file are incorporated into this document by reference.
[066] Alternatively, video encoder 20 and video decoder 30 can operate according to other proprietary or industry standards. Standards include ITU-T H.261, ISO / IECMPEG-1Visual, ITU-T H.262 or ISO / IECMPEG-2Visual, ITU-T H.263, ISO / IECMPEG-4Visual and ITU-T H.264 (also referred to as ISO / IECMPEG-4AVC), and include Scalable Video Encoding (SVC) and Multiple View Video Encoding (MVC) and their extensions. However, the technologies of the present invention are not limited to any particular standard or coding technology.
[067] Furthermore, figure 1 is merely an example and the technologies of the present invention can be applied to a video encoding configuration (for example, video encoding or video decoding) that does not necessarily include any data communication. between the encoding device and the decoding device. In other examples, data is retrieved from local memory, and streamed over a network or operated in a similar way. The encoding apparatus may encode the data and store encoded data in memory, and / or the decoding apparatus may retrieve data from memory and decode the data. In many instances, encoding and decoding is performed by multiple devices that do not communicate with each other, but simply encode data into memory and / or retrieve data from memory and decode the data.
[068] Each of the video encoder 20 and video decoder 30 can be implemented as any of a variety of suitable circuits, such as one or more microprocessors, a digital signal processor (DSP), an integrated circuit of specific application (ASIC), an array of field programmable ports (FPGA), distinct logic, hardware, or any combination thereof. If the technologies are partially implemented in software, an apparatus can store an instruction for the software on a suitable computer-readable non-transitory storage medium and can execute the instruction in hardware when using one or more processors to execute the technologies of the present invention. Anything (including hardware, software, a combination of hardware and software, or the like) from the above can be considered as one or more processors. Each of the video encoder 20 and the video decoder 30 can be included in one or more encoders or decoders, and either one of the video encoder 20 and the video decoder 30 can be integrated as a part of an encoder / combined decoder (an encoder and decoder (CODEC)) in a respective device.
[069] The present invention in general may refer to video encoder 20 as "signaling" particular information to another device (such as video decoder 30). The term “signaling” in general can refer to the communication of an element of syntax and / or other data that represent encoded video data. Such communication can occur in real time or almost in real time. Alternatively, such communication can occur over a period of time, for example, it can occur when a syntax element is stored on a computer-readable storage medium as a stream of encoded bits during encoding. The syntax element can then be retrieved by the decoding device at any time after being stored on this medium.
[070] As mentioned briefly above, video encoder 20 encodes video data. Video data can include one or more images. Each of the images can be a still image. In some instances, the image may be referred to as a video “frame”. The video encoder 20 can generate a bit stream, and the bit stream includes a bit stream that forms an encoded representation of the video data. The encoded representation of the video data can include an encoded image and associated data. The encoded image is a coded representation of an image. The associated data can include a set of sequence parameters (SPS), a set of image parameters (PPS) and another syntax structure. The SPS can include parameters applicable to zero or more image sequences. The PPS can include parameters applicable to zero or more images. The syntax structure can be a set of zero or more syntax elements presented together in a bit stream in a specified order.
[071] To generate an encoded representation of an image, video encoder 20 can partition the image into a mesh of tree encoding blocks (CTBs). In some examples, the CTB may be referred to as a “tree block”, a “large coding unit” (LCU), or a “tree coding unit”. A HEVC CTB in general can be analogous to a macroblock in the previous standard (such as H.264 / AVC). However, a CTB is not necessarily limited to a particular size and can include one or more encoding units (CUs).
[072] Each of the CTBs can be associated with a different block sized equally of pixels within the image. Each pixel can include a luminance sample (luminance or luma) and two chrominance samples (chrominance or chroma). Therefore, each CTB can be associated with one luminance sample block and two chrominance sample blocks. For ease of explanation, in the present invention, a two-dimensional pixel array can be referred to as a pixel block and a set of two-dimensional samples can be referred to as a sample block. The video encoder 20 can partition, by means of quaternary tree partitioning, a block of pixels associated with a CTB into blocks of pixels associated with a CU, which are consequently referred to as "tree encoding blocks".
[073] CTBs of an image can be grouped into one or more slices. In some examples, each slice includes an integer number of CTBs. As an image encoding part, video encoder 20 can generate an encoded representation (i.e., an encoded slice) of each slice of the image. To generate an encoded slice, video encoder 20 can encode each CTB of the slice to generate an encoded representation (i.e., an encoded CTB) of each of the CTBs of the slice.
[074] To generate an encoded CTB, video encoder 20 can perform quaternary tree partitioning repetitively into a pixel block associated with a CTB to progressively partition the pixel block into smaller pixel blocks. Each of the smaller pixel blocks can be associated with a CU. A partitioned CU can be a CU whose pixel block is partitioned into pixel blocks associated with another CU. A non-partitioned CU can be a CU whose pixel block is not partitioned into pixel blocks associated with another CU.
[075] The video encoder 20 can generate one or more prediction units (PUs) for each unpartitioned CU. Each of the CU's PUs can be associated with a different pixel block in CU's pixel blocks. The video encoder 20 can generate a predictive pixel block for each PU of the CU. The PU predictive pixel block can be a pixel block.
[076] The video encoder 20 can generate a predictive pixel block for a PU by means of intra or inter prediction. If the video encoder 20 generates a predictive pixel block from a PU by means of intra prediction, the video encoder 20 can generate the PU predictive pixel block based on decoded pixels from an image associated with the PU. If the video encoder 20 generates a predictive pixel block of a PU by means of inter prediction, the video encoder 20 can generate the PU predictive pixel block based on decoded pixels from one or more images other than an image associated with PU.
[077] Video encoder 20 can generate a residual pixel block for a CU based on CU PU predictive pixel blocks. The residual CU pixel block can indicate differences between samples in the CU PUs predictive pixel blocks and corresponding samples in an original CU pixel block.
[078] Furthermore, as a part of encoding an unpartitioned CU, video encoder 20 can perform recursive quaternary tree partitioning into a residual CU pixel block to partition the residual CU pixel block into one or more smaller residual pixel blocks associated with CU transform units (TUs). Because each of the pixels in pixel blocks associated with the TUs includes a luminance sample and two chrominance samples, each of the TUs can be associated with a residual sample block of the luminance sample and two blocks of residual samples from the chrominance samples.
[079] The video encoder 20 can apply one or more transforms to the residual sample blocks associated with the TUs to generate coefficient blocks (i.e., coefficient blocks). The video encoder 20 can perform a quantization process on each of the coefficient blocks. Quantization in general refers to a process in which coefficients are quantized to possibly reduce the amount of data used to represent the coefficients, thereby performing additional compression.
[080] The video encoder 20 can generate a set of syntax elements that represent the coefficients in the quantized coefficient blocks. The video encoder 20 can apply an entropy encoding operation (such as a context-adaptive binary arithmetic encoding operation (CABAC)) to at least some of these syntax elements.
[081] To apply CABAC encoding to a syntax element, video encoder 20 can binarize the syntax element to form a binary series including a series of one or more bits (which are referred to as "binary"). The video encoder 20 can encode part of the binaries by means of regular CABAC encoding and can encode another part of the binaries by means of deviation encoding.
[082] When the video encoder 20 encodes a sequence of binaries by means of regular CABAC encoding, the video encoder 20 can first identify an encoding context. The encoding context can identify probabilities of encoding binaries having particular values. For example, a coding context may indicate that a probability of encoding a binary rated 0 is 0.7 and a probability of encoding a binary rated 1 is 0.3. After identifying the encoding context, the video encoder 20 can divide an interval into a lower sub-range and an upper sub-range. One of the sub-intervals can be associated with a value 0 and the other sub-interval can be associated with a value 1. A width of the sub-interval can be proportional to a probability indicated for the associated value by the identified coding context.
[083] If a syntax element binary has a value associated with a lower sub-range, an encoded value can be equal to a lower limit of the lower sub-range. If the same binary of the syntax element has a value associated with an upper sub-range, the encoded value can be equal to a lower limit of the upper sub-range. To encode a next binary of the syntax element, the video encoder 20 can repeat these steps with respect to an interval within which the subinterval associated with the encoded bit value lies. When video encoder 20 repeats these steps for the next binary, video encoder 20 can use a probability that is modified based on the probability indicated by the identified encoding context and an actual value of the encoded binary.
[084] When video encoder 20 encodes a sequence of binaries by means of offset encoding, video encoder 20 may be able to encode several binaries in a single cycle, and when video encoder 20 encodes a sequence of binaries through regular CABAC encoding, the video encoder 20 may be able to encode only a single binary in a cycle. Deviation encoding can be simpler because, in deviation encoding, video encoder 20 does not need to select a context and video encoder 20 can assume that the probabilities for both symbols (0 and 1) are 1/2 ( 50%). Therefore, in the deviation coding, the interval is divided directly in the middle. Actually, deviation encoding bypasses an adaptive context part of an arithmetic encoding mechanism.
[085] Performing deviation encoding in a binary requires less computation than performing regular CABAC encoding in the binary. In addition, performing branch encoding can allow for a greater degree of parallelization and a higher throughput. Binaries encoded by means of deviation encoding can be referred to as “deviation encoded binaries”.
[086] In addition to performing entropy coding on syntax elements in a block of coefficients, the video encoder 20 can apply reverse quantization and an inverse transform to a transform block in order to reconstruct a residual sample block from the block of transformed. The video encoder 20 can add the reconstructed residual sample block to a corresponding sample of one or more predictive sample blocks to generate a reconstructed sample block. By reconstructing a sample block for each color component, the video encoder 20 can reconstruct a pixel block associated with a TU. By reconstructing pixel blocks for each CU of a CU in this way, the video encoder 20 can reconstruct the CU pixel blocks.
[087] After the video encoder 20 rebuilds the CU pixel blocks, the video encoder 20 can perform an unlocking operation to reduce a blocking artifact associated with the CU. After the video encoder 20 performs the unlocking operation, the video encoder 20 can modify reconstructed pixel blocks of CTBs of an image by using an adaptive sample offset (SAO). In general, adding an offset value to pixels in the image can improve coding efficiency. After performing these operations, the video encoder 20 can store the reconstructed pixel block of the CU in a temporary storage of decoded images for use when generating a predictive pixel block for another CU.
[088] The video decoder 30 can receive a bit stream. The bit stream may include an encoded representation of video data encoded by the video encoder 20. The video decoder 30 can parse the bit stream to extract syntax elements from the bit stream. As a part of extracting at least some syntax elements from the bit stream, the video decoder 30 can entropy decode data in the bit stream.
[089] When video decoder 30 performs CABAC decoding, video decoder 30 can perform regular CABAC decoding in some binaries and can perform drift decoding in other binaries. When the video decoder 30 performs regular CABAC decoding in a syntax element, the video decoder 30 can identify an encoding context. The video decoder 30 can then divide a range into a lower sub-range and an upper sub-range. One of the sub-intervals can be associated with a value 0 and the other sub-interval can be associated with a value 1. A width of the sub-interval can be proportional to a probability indicated for the associated value by the identified coding context. If an encoded value is within the lower sub-range, the video decoder 30 can decode a binary having a value associated with the lower sub-range. If an encoded value is within the upper subinterval, the video decoder 30 can decode a binary having a value associated with the upper subinterval. To decode a next binary of the syntax element, the video decoder 30 can repeat these steps related to an interval within which is the subinterval that includes the encoded value. When the video decoder 30 repeats these steps for the next binary, the video decoder 30 can use a probability that is modified based on the probability indicated by the identified encoding context and the decoded binary. The video decoder 30 can then eliminate binarization of the binaries to restore the syntax element. Elimination of binarization can mean selecting a syntax element value according to a mapping between a binary series and a syntax element value.
[090] When the video decoder 30 performs bypass decoding, the video decoder 30 may be able to decode several binaries in a single cycle, but when the video decoder 30 performs regular CABAC decoding, the video decoder 30 of a generally it may only be able to decode a single torque in one cycle, or require more than one cycle for a single torque. Deviation decoding can be simpler than regular CABAC decoding because the video decoder 30 does not need to select a context and can assume that the probabilities for both symbols (0 and 1) are 1/2. In this way, performing encoding and / or deviation decoding on a binary may require less computation than performing regular encoding on the binary and may allow for a greater degree of parallelization and a higher throughput.
[091] The video decoder 30 can reconstruct an image of video data based on elements of syntax extracted from a bit stream. A process of reconstructing the video data based on the syntax elements in general can be reciprocal to a process performed by the video encoder 20 to generate the syntax elements. For example, video decoder 30 can generate, based on syntax elements associated with a CU, predictive pixel blocks for CU's PUs. Furthermore, the video decoder 30 can invert blocks of quantization coefficients associated with CU's TUs. The video decoder 30 can perform a reverse transform on the coefficient blocks to reconstruct blocks of residual pixels associated with the CU's TUs. The video decoder 30 can reconstruct CU pixel blocks based on predictive pixel blocks and residual pixel blocks.
[092] After the video decoder 30 rebuilds the CU pixel blocks, the video decoder 30 can perform an unlock operation to reduce a blocking artifact associated with the CU. Furthermore, based on one or more elements of SAO syntax, the video decoder 30 can apply the SAO applied by the video encoder 20. After the video decoder 30 performs these operations, the video decoder 30 can store the blocks of CU pixels in a temporary storage of decoded images. Temporary storage of decoded images can provide reference images for subsequent motion compensation, intra prediction and presentation on a display device.
[093] Figure 2 is a block diagram illustrating an example of a video encoder 20 that is configured to implement the technologies of the present invention. Figure 2 is provided for explanatory purposes and should not be construed as limiting the technologies exemplified and generally described in the present invention. For the purpose of explanation, the video encoder 20 is described in the present invention in HEVC encoding image prediction. However, the technologies of the present invention are applicable to another standard or coding method.
[094] In the example of figure 2, the video encoder 20 includes a prediction processing unit 100, a residual generation unit 102, a transform processing unit 104, a quantization unit 106, an inverse quantization unit 108, a reverse transform processing unit 110, a reconstruction unit 112, a filter unit 113, a temporary storage of decoded images 114 and an entropy coding unit 116. The entropy coding unit 116 includes a regular CABAC encoding 118 and a bypass encoding mechanism 120. The prediction processing unit 100 includes an inter prediction processing unit 121 and an intra prediction processing unit 126. The inter prediction processing unit 121 includes a unit motion estimator 122 and motion compensation unit 124. In another example, video encoder 20 may include more, fewer or different functional components.
[095] Video encoder 20 receives video data. To encode the video data, the video encoder 20 can encode each slice of each image of the video data. As a part of encoding a slice, video encoder 20 can encode each CTB in the slice. As a part of encoding a CTB, the prediction processing unit 100 can perform quaternary tree partitioning into a pixel block associated with the CTB to progressively partition the pixel block into smaller pixel blocks. Smaller pixel blocks can be associated with CUs. For example, the prediction processing unit 100 can partition a block of pixels from a CTB into four equally sized sub-blocks, partition one or more of the sub-blocks into four new equally sized sub-blocks, and so on.
[096] Video encoder 20 can encode CUs from a CTB into an image to generate encoded representations of the CUs (i.e., encoded CUs). Video encoder 20 can encode CTB CUs in a z scan order. In other words, video encoder 20 can sequentially encode an upper left CU, an upper right CU, a lower left CU and then a lower right CU. When video encoder 20 encodes a partitioned CU, video encoder 20 can encode, in scan order z, CUs associated with subblocks of a block of pixels from the partitioned CU.
[097] Furthermore, as a part of encoding a CU, the prediction processing unit 100 can partition a block of CU pixels between one or more CU PUs. Video encoder 20 and video decoder 30 can support various sizes of PU. Assuming that a particular CU size is 2N * 2N, video encoder 20 and video decoder 30 can support a PU size of 2Nx2N or NxN for intra prediction, and a symmetric PU size of 2N * 2N, 2N * N, N * 2N, NxN or similar for inter prediction. Video encoder 20 and video decoder 30 can also support asymmetric partitioning for PU sizes of 2NxnU, 2NxnD, nL * 2N and nRx2N for inter prediction.
[098] The inter prediction processing unit 121 can generate predictive data for a PU by performing inter prediction on each PU in a CU. Predictive data for the PU can include a predictive pixel block that corresponds to the PU and PU motion information. Slices can be slices I, slices P or slices B. The inter-prediction processing unit 121 can perform different operations for a PU from a CU depending on whether the PU is in a slice I, a slice P or a slice B. In slice I, all PUs are with intra prediction. Therefore, if a PU is in slice I, the inter-prediction processing unit 121 does not perform inter-prediction in the PU.
[099] If a PU is in a P slice, the motion estimation unit 122 can search for reference images in a reference image list (such as "list 0") for a PU reference block. The PU reference block can be a pixel block that corresponds more exactly to a PU pixel block. The motion estimation unit 122 can generate a reference image index that indicates a reference image in list 0 including the PU reference block and a motion vector that indicates a spatial shift between the PU pixel block and the reference block. The motion estimation unit 122 can produce the reference image index and the motion vector as motion information of the PU. The motion compensation unit 124 can generate a PU predictive pixel block based on the reference block indicated by the PU motion information.
[0100] If a PU is in a B slice, the motion estimation unit 122 can perform inter unidirectional prediction or inter bidirectional prediction in the PU. To perform inter one-way prediction on the PU, the motion estimation unit 122 can search for reference images in a first reference image list (“list 0”) or in a second reference image list (“list 1”) for a PU reference block. The motion estimation unit 122 can produce the following as PU motion information: a reference image index that indicates a position in list 0 or list 1 of a reference image that includes the reference block, a vector of movement that indicates a spatial shift between the PU pixel block and the reference block, and a prediction direction indicator that indicates whether the reference image is in list 0 or in list 1.
[0101] To perform inter bidirectional prediction on a PU, the motion estimation unit 122 can search for reference images in list 0 for a reference block of the PU and can also search for reference images in list 1 for another block reference point. The motion estimation unit 122 can generate indexes of reference images that indicate positions in list 0 and in list 1 of reference images that include reference blocks. Furthermore, the motion estimation unit 122 can generate motion vectors that indicate spatial displacements between the reference blocks and a PU pixel block. PU motion information can include reference image indexes and PU motion vectors. The motion compensation unit 124 can generate a PU predictive pixel block based on the reference blocks indicated by the PU motion information.
[0102] The intra prediction processing unit 126 can generate predictive data for a PU when performing intra prediction on the PU. Predictive PU data can include a predictive PU pixel block and various syntax elements. The intra prediction processing unit 126 can perform intra prediction on PUs in slices I, slices P and slices B.
[0103] To perform intra prediction on a PU, the intra prediction processing unit 126 can generate multiple predictive data sets for the PU when using multiple intra prediction modes. To generate a set of predictive data for the PU when using an intra prediction mode, the intra prediction processing unit 126 can extend samples from adjacent PU sample blocks through PU sample blocks in a direction associated with the mode. of intra prediction. Adjacent PUs can be at the top, top right, top left or left side of the PU, assuming that an encoding order from left to right and top to bottom is used for PUs, CUs and CTBs. The intra prediction processing unit 126 can use various amounts of intra prediction modes, for example, 33 intra directional prediction modes. In some examples, the number of intra prediction modes may depend on the size of a block of pixels of the PU.
[0104] Prediction processing unit 100 can select predictive data for PUs from a predictive data CU that is generated by the inter prediction processing unit 121 for PUs or predictive data that is generated by the intra prediction processing unit 126 for the PUs. In some examples, the prediction processing unit 100 selects the predictive data for CU PUs based on rate / distortion metrics from the predictive data sets. Predictive pixel blocks of selected predictive data can be referred to in this document as selected predictive pixel blocks.
[0105] Residual generation unit 102 can generate, based on a block of pixels from a CU and blocks of predictive pixels selected from PUs of the CU, a residual block of pixels from the CU. For example, the residual generating unit 102 can generate the CU residual pixel block so that each sample in the residual pixel block has a value equal to a difference between a sample in the CU pixel block and a corresponding sample in the predictive pixel block selected from the CU PU.
[0106] The prediction processing unit 100 can perform quaternary tree partitioning to partition the residual CU pixel block into sub-blocks. Each residual block of unpartitioned pixels can be associated with a TU other than CU. Residual pixel block sizes and positions associated with CU TUs may or may not be based on CU block pixel sizes and positions.
[0107] Because each of the pixels in the residual pixel blocks of the TUs can include a luminance sample and two chrominance samples, each of the TUs can be associated with a luminance sample block and two chrominance sample blocks . The transform processing unit 104 can generate a block of coefficients for each TU of the CU by applying one or more transforms to a residual sample block associated with the TU. The transform processing unit 104 can apply several transforms to the residual sample block associated with the TU. For example, transform processing unit 104 can apply a discrete cosine transform (DCT), a directional transform or a similar transform conceptually to the residual sample block.
[0108] The quantization unit 106 can quantize coefficients in a block of coefficients. The quantization process can reduce a bit depth associated with all or some of the coefficients. For example, a coefficient of n bits can be rounded down to a coefficient of m bits during quantization, where n is greater than m. The quantization unit 106 can quantize a block of coefficients associated with a CU's TU based on a quantization parameter (QP) value associated with the CU. The video encoder 20 can adjust a degree of quantization applied to the coefficient blocks associated with the CU by adjusting the QP value associated with the CU.
[0109] The inverse quantization unit 108 and the inverse transform processing unit 110 can apply reverse quantization and an inverse transform to a block of coefficients separately, to reconstruct a residual sample block from the coefficient block. The reconstruction unit 112 can add a sample of the reconstructed residual sample block to a corresponding sample of one or more predictive sample blocks generated by the prediction processing unit 100 to generate a reconstructed sample block associated with a TU. By reconstructing sample blocks for each CU of a CU in this way, the video encoder 20 can reconstruct the CU's pixel block.
[0110] The filter unit 113 can perform an unlocking operation to reduce a blocking artifact in the pixel block associated with the CU. Furthermore, the filter unit 113 can apply an SAO offset determined by the prediction processing unit 100 to the reconstructed sample block to restore the pixel block. The filter unit 113 can generate a sequence of SAO syntax elements for a CTB. SAO syntax elements can include regular CABAC encoded binaries and deviation encoded binaries. According to the technologies of the present invention, within the sequence, none of the deviation encoded binaries for a color component is not between two regular CABAC encoded binaries for the same color component.
[0111] Temporary storage of decoded images 114 can store the reconstructed pixel block. The inter-prediction processing unit 121 can perform inter-prediction on a PU of another image when using a reference image that includes the reconstructed pixel block. Furthermore, the intra prediction processing unit 126 can perform intra prediction on another PU on the same image as the CU when using the reconstructed pixel block in the temporary storage of decoded images 114.
[0112] Entropy coding unit 116 can receive data from another functional component of video encoder 20. For example, entropy coding unit 116 can receive blocks of coefficients from quantization unit 106 and can receive syntax elements of the prediction processing unit 100. The entropy coding unit 116 can perform one or more entropy coding operations on the data to generate entropy encoded data. For example, the entropy encoding unit 116 can perform a context-adaptive variable-length encoding operation (CAVLC), a CABAC operation, a variable-to-variable length encoding operation (V2V), a binary arithmetic encoding operation adaptive to the context based on syntax (SBAC), a probability interval partitioning (PIPE) entropy coding operation, or another type of entropy coding operation on the data. In a particular example, the entropy coding unit 116 can encode the SAO syntax elements generated by the filter unit 113. As a part of encoding the SAO syntax elements, the entropy coding unit 116 can encode the encoded binaries. Regular CABAC of SAO syntax elements when using the regular CABAC encoding mechanism 118 and can encode bypass encoded binaries when using the bypass encoding mechanism 120.
[0113] According to the technologies of the present invention, the inter-121 prediction processing unit determines a set of candidate prediction modes between frames. Thus, video encoder 20 is an example of a video encoder. In accordance with the technologies of the present invention, the video encoder is configured to: determine, according to information about image units adjacent to an image unit to be processed, whether a set of candidate prediction modes for the unit image to be processed includes an affine fusion mode, where the affine fusion mode indicates that respective predicted images of the image unit to be processed and the adjacent image units of the image unit to be processed are obtained when using the same model to; determine, in the set of candidate prediction modes, a prediction mode for the image unit to be processed; determining a predicted image of the image unit to be processed according to the prediction mode; and encoding the first indication information in a bit stream, where the first indication information indicates the prediction mode.
[0114] Figure 3 is a flow chart illustrating an example operation 200 of a video encoder for encoding video data, according to one or more technologies of the present invention. Figure 3 is provided as an example. In another example, the technologies of the present invention can be implemented by using more, less or different steps than those shown in the example in figure 3. According to the example method in figure 3, the video encoder 20 performs the following steps.
[0115] S210. Determine, according to information regarding imaging units adjacent to an imaging unit to be processed, whether a set of candidate prediction modes for the imaging unit to be processed includes a related fusion mode.
[0116] Specifically, as shown in figure 4, blocks A, B, C, D and E are adjacent reconstructed blocks of a current block to be coded, and are respectively located at the top, left, top right, bottom bottom left and top left of the block to be coded. It can be determined, according to the encoding information of the adjacent reconstructed blocks, whether a set of candidate prediction modes for the current block to be encoded includes a related fusion mode.
[0117] It should be understood that figure 4 in this embodiment of the present invention shows a number and positions of the reconstructed blocks adjacent to the block to be coded for purposes of illustration. The number of adjacent reconstructed blocks can be more or less than five, and no limitations are imposed on this.
[0118] In a first possible implementation, it is determined if there is a block whose type of prediction is a similar prediction among the adjacent reconstructed blocks. If there is no block whose type of prediction is related to prediction among the adjacent reconstructed blocks, the set of candidate prediction modes for the block to be coded does not include the related fusion mode; or if there is a block whose type of prediction is similarly predicted among the adjacent reconstructed blocks, a coding process shown in figure 2 is performed separately according to two cases: The set of candidate prediction modes for the block to be coded includes the related fusion mode, and the set of candidate prediction modes for the block to be encoded does not include the related fusion mode. If the coding performance of the first case is better, the set of candidate prediction modes for the block to be coded includes the related fusion mode, and indication information, which can be assumed to be the second indication information, is set to 1 and is encoded in a bit stream. Otherwise, the set of candidate prediction modes for the block to be encoded does not include the related fusion mode, and the second indication information is set to 0 and is encoded in a bit stream.
[0119] In a second possible implementation, it is determined whether there is a block whose type of prediction is a similar prediction among the adjacent reconstructed blocks. If there is no block whose type of prediction is related to prediction among the adjacent reconstructed blocks, the set of candidate prediction modes for the block to be coded does not include the related fusion mode; or if there is a block whose type of prediction is related to prediction among the adjacent reconstructed blocks, the set of candidate prediction modes for the block to be coded includes the related fusion mode.
[0120] In a third possible implementation, the adjacent reconstructed blocks include multiple affine modes, which include, for example, a first affine mode or a second affine mode, and correspondingly the affine fusion mode includes a first affine fusion mode that merges the first related mode or a second related fusion mode which merges the second related mode. Quantities of first affine modes, second affine modes and non-affine modes among the adjacent reconstructed blocks are obtained separately by means of statistical collection. When the first affine mode is ranked first in quantity among the adjacent reconstructed blocks, the set of candidate prediction modes includes the first affine fusion mode and does not include the second affine fusion mode. When the second affine mode is ranked first in quantity among the adjacent reconstructed blocks, the set of candidate prediction modes includes the second affine fusion mode and does not include the first affine fusion mode. When the non-affine mode is ranked first in quantity among the adjacent reconstructed blocks, the set of candidate prediction modes does not include the affine fusion mode.
[0121] Alternatively, in the third possible implementation, when the first affine mode is ranked first in quantity among the adjacent reconstructed blocks, the set of candidate prediction modes includes the first affine fusion mode and does not include the second affine fusion mode . When the second affine mode is ranked first in quantity among the adjacent reconstructed blocks, the set of candidate prediction modes includes the second affine fusion mode and does not include the first affine fusion mode. When the non-related mode is ranked first in quantity among the adjacent reconstructed blocks, it is obtained by means of statistical collection if the first related mode or the second related mode is classified as second in quantity among the adjacent reconstructed blocks. When the first affine mode is classified as second in quantity among the adjacent reconstructed blocks, the set of candidate prediction modes includes the first affine fusion mode and does not include the second affine fusion mode. When the second affine mode is classified as second in quantity among the adjacent reconstructed blocks, the set of candidate prediction modes includes the second affine fusion mode and does not include the first affine fusion mode.
[0122] In a fourth possible implementation, it is determined whether two conditions are met: (1) whether there is a block whose type of prediction is an affine mode among the adjacent reconstructed blocks; and (2) whether a width and height of the adjacent block in the affine mode are less than a width and height of the block to be coded. If one or other of the conditions is not met, the set of candidate prediction modes for the block to be encoded does not include the related fusion mode. If the two conditions are met, a coding process shown in figure 2 is performed separately according to two cases: The set of candidate prediction modes for the block to be coded includes the related fusion mode, and the set of matching modes. candidate prediction for the block to be encoded does not include the related fusion mode. If the coding performance of the first case is better, the set of candidate prediction modes for the block to be coded includes the related fusion mode, and indication information, which can be assumed as the third indication information, is set to 1 and is encoded in a bit stream. Otherwise, the set of candidate prediction modes for the block to be encoded does not include the related fusion mode, and the third indication information is set to 0 and is encoded in a bit stream.
[0123] It should be understood that the determination condition (2) in this embodiment of the present invention means that the width of the adjacent block in the affine mode is less than the width of the block to be coded and the height of the adjacent block in the affine mode is less that the height of the block to be coded. In another modality, alternatively, the determination condition can be: the width of the adjacent block in the affine mode is less than the width of the block to be coded or the height of the adjacent block in the affine mode is less than the height of the block to be encoded. encoded, and no limitations are imposed on this.
[0124] In a fifth possible implementation, it is determined whether two conditions are met: (1) whether there is a block whose type of prediction is an affine mode among the adjacent reconstructed blocks; and (2) whether a width and height of the adjacent block in the affine mode are less than a width and height of the block to be coded. If one or other of the conditions is not met, the set of candidate prediction modes for the block to be encoded does not include the related fusion mode. If both conditions are met, the set of candidate prediction modes for the block to be coded includes the related fusion mode.
[0125] It should be understood that, in this embodiment of the present invention, prediction types and sizes of adjacent reconstructed blocks are used as a basis for determining the set of candidate prediction modes for the current block to be encoded, and attribute information of the adjacent reconstructed blocks that are obtained through parsing can also be used for determination. This is not limited to this document.
[0126] It should be further understood that, in various possible implementations in this embodiment of the present invention, for purposes of illustration, such as the second possible implementation, the following determination criteria can be used to determine whether there is a block whose type of prediction is similar prediction among the adjacent reconstructed blocks. For purposes of illustration, when prediction types of at least two adjacent blocks are the affine mode, the set of candidate prediction modes for the block to be encoded includes the affine fusion mode; otherwise, the set of candidate prediction modes for the block to be encoded does not include the related fusion mode. Alternatively, a number of adjacent blocks whose type of prediction is the affine mode can also be at least three or at least four, and no limitations are imposed on this.
[0127] It should be further understood that, in various possible implementations in this embodiment of the present invention, for purposes of illustration, as in the fifth possible implementation, it is determined whether two conditions are met: (1) whether there is a block whose type of prediction it is an affine mode among the adjacent reconstructed blocks; and (2) whether a width and height of the adjacent block in the affine mode are less than a width and height of the block to be coded. The second determination condition, for illustration purposes, can also be if the width and height of the adjacent block in the affine mode are less than 1/2, 1/3 or 1/4 of the width and height of the block to be coded , and no limitations are imposed on this.
[0128] It should be further understood that, in this embodiment of the present invention, the indication information to be set to 0 or 1 is for purposes of illustration. Alternatively, a reverse configuration can be performed. For purposes of illustration, for example, in the first possible implementation, it can be determined whether there is a block whose type of prediction is a similar prediction among the adjacent reconstructed blocks. If there is no block whose type of prediction is related to prediction among the adjacent reconstructed blocks, the set of candidate prediction modes for the block to be coded does not include the related fusion mode; or if there is a block whose type of prediction is similarly predicted among the adjacent reconstructed blocks, a coding process shown in figure 2 is performed separately according to two cases: The set of candidate prediction modes for the block to be coded includes the related fusion mode, and the set of candidate prediction modes for the block to be encoded does not include the related fusion mode. If the encoding performance of the first case is better, the set of candidate prediction modes for the block to be encoded includes the related fusion mode, and indication information, which can be assumed to be the second indication information, is set to 0 and is encoded in a bit stream. Otherwise, the set of candidate prediction modes for the block to be encoded does not include the related fusion mode, and the second indication information is set to 1 and is encoded in a bit stream.
[0129] S220. Determine, in the set of candidate prediction modes, a prediction mode for the image unit to be processed.
[0130] The set of candidate prediction modes is the set of candidate prediction modes determined in S210. Each prediction mode in the set of candidate prediction modes is used sequentially to perform the encoding process shown in figure 2, to select a mode with optimal encoding performance as the prediction mode for the block to be encoded.
[0131] It should be understood that, in this embodiment of the present invention, an objective of carrying out the coding process shown in figure 2 is to select a prediction mode with optimal coding performance. In the selection process, performance / cost ratios of the prediction modes can be compared. Performance is indicated by means of image restoration quality, and a cost is indicated by an encoding bit rate. Alternatively, only performance or costs of the prediction modes can be compared. Correspondingly, all the coding steps shown in figure 2 can be completed, or a coding process is interrupted after indicators that need to be compared are obtained. For example, if the prediction modes are compared only in terms of performance, the encoding process can be interrupted after a prediction unit has completed its steps, and no limitations are imposed on this.
[0132] S230. Determine a predicted image from the image unit to be processed according to the prediction mode.
[0133] The H.265 standard and application files such as the previously mentioned CN201010247275.7 describe in detail a process in which a predicted image of a block to be encoded is generated according to a prediction mode, including a prediction mode for a translational model, a related prediction mode, a related fusion mode or the like, and details are not described here again.
[0134] S240. Encode first indication information in a bit stream.
[0135] The prediction mode determined in S220 is encoded in the bit stream. It should be understood that the step can be performed at any time after S220, and no particular limitation is not imposed on an order of steps, as long as the step corresponds to a step of decoding the first indication information by a decoding side.
[0136] Figure 5 is a block diagram illustrating an example of another video encoder 40 for encoding video data, according to one or more technologies of the present invention.
[0137] Video encoder 40 includes: a first determination module 41, a second determination module 42, a third determination module 43 and an encoding module 44.
[0138] The first determination module 41 is configured to execute S210 to determine, according to information about imaging units adjacent to an imaging unit to be processed, whether a set of candidate prediction modes for the imaging unit to be processed includes a related fusion mode.
[0139] The second determination module 42 is configured to execute S220 to determine, in the set of candidate prediction modes, a prediction mode for the image unit to be processed.
[0140] The third determination module 43 is configured to execute S230 to determine a predicted image of the image unit to be processed according to the prediction mode.
[0141] The encoding module 44 is configured to execute S240 of encoding first indication information in a bit stream.
[0142] Because movement information from adjacent blocks is correlated, there is a very high probability that a current block and an adjacent block have the same or similar prediction mode. In this embodiment of the present invention, prediction mode information of the current block is derived by determining information about adjacent blocks, reducing a bit rate of encoding a prediction mode, thereby improving encoding efficiency.
[0143] Figure 6 is a flow chart illustrating an example operation 300 of a video encoder for encoding video data, according to one or more technologies of the present invention. Figure 6 is provided as an example. In another example, the technologies of the present invention can be implemented by using more, less or different steps than those shown in the example in figure 6. According to the example method in figure 6, the video encoder 20 performs the following steps.
[0144] S310. Encode indication information for a set of candidate prediction modes for a first image area to be processed.
[0145] When a set of candidate translational modes is used as a set of candidate modes for the first image area to be processed, the first indication information is set to 0 and the first indication information is encoded in a bit stream, where the translational mode indicates a prediction mode in which a predicted image is obtained when using a translational model. When a set of candidate translational modes and a set of related candidate modes are used as a set of candidate modes for the first image area to be processed, the first indication information is set to 1 and the first indication information is encoded in a bitstream, where the affine mode indicates a prediction mode in which a predicted image is obtained when using an affine model. The first image area to be processed can be any one of a group of image frames, an image frame, a set of image tiles, a set of image slices, an image tile, an image slice, a set of image encoding units or an image encoding unit. Correspondingly, the first indication information is encoded in a header of the group of image frames (s), for example, a set of video parameters (VPS), a set of sequence parameters (SPS), supplementary enrichment information ( SEI) or an image frame header, or, for example, an image parameter set (PPS), an image tile set header, an image slice set header, or an image tile header , or, for example, an image tile header (tile header), an image slice header (slice header), an image encoding cluster header, or an image encoding header .
[0146] It should be understood that the first image area to be processed in this step can be pre-configured, or can be adaptively determined in an encoding process. A representation of a strip of the first image area to be processed can be known from a one-sided encoding / decoding protocol, or a strip of the first image area to be processed can be encoded in the bit stream for transmission, and no limitation is not imposed on this.
[0147] It should be further understood that the set of candidate prediction modes can be pre-configured, or can be determined after comparing coding performance, and no limitations are not imposed on this.
[0148] It should be further understood that, in this embodiment of the present invention, the indication information to be set to 0 or 1 is for purposes of illustration. Alternatively, a reverse configuration can be performed.
[0149] S320. Determine, in the set of candidate prediction modes for the first image area to be processed for an image unit to be processed in the first image area to be processed, a prediction mode for the image unit to be processed.
[0150] A specific method is similar to S220, and details are not described here again.
[0151] S330. Determine a predicted image from the image unit to be processed according to the prediction mode.
[0152] A specific method is similar to S230, and details are not described here again.
[0153] S340. Encode the selected prediction mode for the image unit to be processed in a bit stream.
[0154] A specific method is similar to S240, and details are not described here again.
[0155] Figure 7 is a block diagram illustrating an example of another video encoder 50 for encoding video data, according to one or more technologies of the present invention.
[0156] The video encoder 50 includes: a first encoding module 51, a first determination module 52, a second determination module 53 and a second encoding module 54.
[0157] The first encoding module 51 is configured to execute S310 encoding indication information from a set of candidate prediction modes for a first image area to be processed.
[0158] The first determination module 52 is configured to execute S320 to determine, in the set of candidate prediction modes for the first image area to be processed for an image unit to be processed in the first image area to be processed, a prediction mode for the imaging unit to be processed.
[0159] The second determination module 53 is configured to execute S330 to determine a predicted image of the image unit to be processed according to the prediction mode.
[0160] The second encoding module 54 is configured to execute S340 to encode the selected prediction mode for the image unit to be processed in the bit stream.
[0161] Because the movement information of adjacent blocks is correlated, there is a very high probability that there is only one translational movement and no related movement in the same area. In this embodiment of the present invention, a set of candidate prediction modes marking an area level is established, avoiding a bit rate of encoding a redundant mode, thereby improving encoding efficiency.
[0162] Figure 8 is a block diagram illustrating an example of a video decoder 30 that is configured to implement the technologies of the present invention. Figure 8 is provided for explanatory purposes and should not be construed as limiting the technologies exemplified and generally described in the present invention. For the purpose of explanation, video decoder 30 is described in the present invention in HEVC encoding image prediction. However, the technologies of the present invention are applicable to another standard or method of coding.
[0163] In the example of figure 8, the video decoder 30 includes an entropy decoding unit 150, a prediction processing unit 152, an inverse quantization unit 154, an inverse transform processing unit 156, an reconstruction 158, a filter unit 159 and a temporary storage of decoded images 160. The prediction processing unit 152 includes a motion compensation unit 162 and an intra prediction processing unit 164. The entropy decoding unit 150 includes a regular CABAC encoding mechanism 166 and a bypass encoding mechanism 168. In another example, video decoder 30 may include more, less or different functional components.
[0164] The video decoder 30 can receive a bit stream. The entropy decoding unit 150 can parse the bit stream to extract syntax elements from the bit stream. As a part of parsing the bit stream, the entropy decoding unit 150 can entropy decode entropy-encoded syntax elements in the bit stream. The prediction processing unit 152, the inverse quantization unit 154, the inverse transform processing unit 156, the reconstruction unit 158 and the filtering unit 159 can generate encoded video data based on the syntax elements extracted from the stream of bits.
[0165] The bit stream can include a sequence of SAO syntax elements encoded from a CTB. SAO syntax elements can include regular CABAC encoded binaries and deviation encoded binaries. According to the technologies of the present invention, following encoded SAO syntax elements, none of the deviation encoded binaries is between two of the regular CABAC encoded binaries. The entropy decoding unit 150 can decode the SAO syntax elements. As a part of encoding the SAO syntax elements, the entropy decoding unit 150 can decode the regular CABAC encoded binaries when using the regular CABAC encoding mechanism 166 and can decode the offset encoded binaries when using the encoding mechanism. deviation 168.
[0166] Furthermore, the video decoder 30 can perform a reconstruction operation on an unpartitioned CU. To perform a reconstruction operation on the unpartitioned CU, the video decoder 30 can perform a reconstruction operation on each CU of the CU. When performing the reconstruction operation on each CU of the CU, the video decoder 30 can reconstruct blocks of residual pixels associated with the CU.
[0167] As a part of performing a reconstruction operation on a CU's TU, the inverse quantization unit 154 can inversely quantize (i.e., decantify) blocks of coefficients associated with the TU. The inverse quantization unit 154 can determine a degree of quantization by using a QP value associated with the CU of the TU, and can determine a degree of inverse quantization to be applied by the inverse quantization unit 154.
[0168] After the inverse quantization unit 154 inversely quantizes the coefficient blocks, the inverse transform processing unit 156 can apply one or more inverse transforms to the coefficient blocks, to generate residual sample blocks associated with the TU. For example, the inverse transform processing unit 156 can apply an inverse DCT, an inverse integer transform, an inverse Karhunen-Loeve transform (KLT), an inverse rotational transform, an inverse directional transform, or another inverse transform to the coefficient blocks.
[0169] If a PU is encoded using intra prediction, the intra prediction processing unit 164 can perform intra prediction to generate a predictive sample block for the PU. The intra prediction processing unit 164 can use an intra prediction mode to generate the predictive pixel block for the PU based on pixel blocks of spatially adjacent PUs. The intra prediction processing unit 164 can determine the intra prediction mode for the PU based on one or more elements of syntax obtained from the bit stream by means of parsing.
[0170] The motion compensation unit 162 can build a first list of reference images (list 0) and a second list of reference images (list 1) based on the syntax elements extracted from the bit stream. Furthermore, if a PU is encoded by means of inter prediction, the entropy decoding unit 150 can extract motion information from the PU. The motion compensation unit 162 can determine one or more reference blocks for the PU based on the movement information of the PU. The motion compensation unit 162 can generate a predictive pixel block for the PU based on one or more reference blocks for the PU.
[0171] Reconstruction unit 158 can use residual pixel blocks associated with CU's TUs and predictive pixel blocks (i.e., intra-prediction data or inter-prediction data) from CU's PUs, where applicable, to reconstruct a block of CU pixels. In particular, the reconstruction unit 158 can add samples of the residual pixel blocks to the corresponding samples of the predictive pixel blocks to reconstruct the CU pixel block.
[0172] The filter unit 159 can perform an unlocking operation to reduce a blocking artifact associated with the CTB CU pixel block. Furthermore, the filter unit 159 can modify the CTB pixel block based on the parsed SAO syntax elements of the bit stream. For example, filter unit 159 can determine values based on CTB's SAO syntax elements, and add the determined values to samples in the CTB's reconstructed pixel block. By modifying at least one of the CTB pixel blocks of an image, the filter unit 159 can modify a reconstructed image of video data based on elements of SAO syntax.
[0173] The video decoder 30 can store the CU pixel block in the temporary storage of decoded images 160. The temporary storage of decoded images 160 can provide reference images for subsequent motion compensation, intra prediction and presentation on a video device. display (such as the display device 32 in figure 1). For example, the video decoder 30 can perform, based on the pixel block in the temporary storage of decoded images 160, an intra prediction or inter prediction operation on a PU of another CU.
[0174] In accordance with the technologies of the present invention, the prediction processing unit 152 determines a set of candidate interframe prediction modes. Thus, video decoder 30 is an example of a video decoder. In accordance with the technologies of the present invention, the video decoder is configured to: determine, according to information about image units adjacent to an image unit to be processed, whether a set of candidate prediction modes for the unit image to be processed includes an affine fusion mode, where the affine fusion mode indicates that respective predicted images of the image unit to be processed and the adjacent image units of the image unit to be processed are obtained when using the same model to; parsing a bit stream to obtain first indication information; determine, in the set of candidate prediction modes, a prediction mode for the image unit to be processed according to the first indication information; and determining a predicted image of the image unit to be processed according to the prediction mode.
[0175] Figure 9 is a flow chart illustrating an example 400 operation of a video decoder for decoding video data, according to one or more technologies of the present invention. Figure 9 is provided as an example. In another example, the technologies of the present invention can be implemented by using more, less or different steps than those shown in the example in figure 9. According to the example method in figure 9, the video decoder 30 performs the following steps.
[0176] S410. Determine, according to information regarding imaging units adjacent to an imaging unit to be processed, whether a set of candidate prediction modes for the imaging unit to be processed includes a related fusion mode.
[0177] Specifically, as shown in figure 4, blocks A, B, C, D and E are adjacent reconstructed blocks of a block to be coded current, and are located respectively at the top, left, top right, bottom bottom left and top left of the block to be coded. It can be determined, according to the encoding information of the adjacent reconstructed blocks, whether a set of candidate prediction modes for the current block to be encoded includes a related fusion mode.
[0178] It should be understood that figure 4 in this embodiment of the present invention shows a number and positions of the reconstructed blocks adjacent to the block to be coded for purposes of illustration. The number of adjacent reconstructed blocks can be more or less than five, and no limitations are imposed on this.
[0179] In a first possible implementation, it is determined if there is a block whose type of prediction is a similar prediction among the adjacent reconstructed blocks. When a prediction mode for at least one of the adjacent image units is to obtain a predicted image using a related model, the bit stream is parsed to obtain second indication information. When the second indication information is 1, the set of candidate prediction modes includes the related fusion mode; or when the second indication information is 0, the set of candidate prediction modes does not include the related fusion mode; otherwise, the set of candidate prediction modes does not include the related fusion mode.
[0180] In a second possible implementation, it is determined if there is a block whose type of prediction is a similar prediction among the adjacent reconstructed blocks. If there is no block whose type of prediction is related to prediction among the adjacent reconstructed blocks, the set of candidate prediction modes for the block to be coded does not include the related fusion mode; or if there is a block whose type of prediction is related to prediction among the adjacent reconstructed blocks, the set of candidate prediction modes for the block to be coded includes the related fusion mode.
[0181] In a third possible implementation, the adjacent reconstructed blocks include multiple affine modes, which include, for example, a first affine mode or a second affine mode, and correspondingly the affine fusion mode includes a first affine fusion mode that fuses the first related mode or a second related fusion mode which merges the second related mode. Quantities of first affine modes, second affine modes and non-affine modes among the adjacent reconstructed blocks are obtained separately by means of statistical collection. When the first affine mode is ranked first in quantity among the adjacent reconstructed blocks, the set of candidate prediction modes includes the first affine fusion mode and does not include the second affine fusion mode. When the second affine mode is ranked first in quantity among the adjacent reconstructed blocks, the set of candidate prediction modes includes the second affine fusion mode and does not include the first affine fusion mode. When the non-affine mode is ranked first in quantity among the adjacent reconstructed blocks, the set of candidate prediction modes does not include the affine fusion mode.
[0182] Alternatively, in the third possible implementation, when the first affine mode is ranked first in quantity among the adjacent reconstructed blocks, the set of candidate prediction modes includes the first affine fusion mode and does not include the second affine fusion mode . When the second affine mode is ranked first in quantity among the adjacent reconstructed blocks, the set of candidate prediction modes includes the second affine fusion mode and does not include the first affine fusion mode. When the non-related mode is ranked first in quantity among the adjacent reconstructed blocks, it is obtained by means of statistical collection if the first related mode or the second related mode is classified as second in quantity among the adjacent reconstructed blocks. When the first affine mode is classified as second in quantity among the adjacent reconstructed blocks, the set of candidate prediction modes includes the first affine fusion mode and does not include the second affine fusion mode. When the second affine mode is classified as second in quantity among the adjacent reconstructed blocks, the set of candidate prediction modes includes the second affine fusion mode and does not include the first affine fusion mode.
[0183] In a fourth possible implementation, it is determined whether two conditions are met: (1) whether there is a block whose type of prediction is an affine mode among the adjacent reconstructed blocks; and (2) whether a width and height of the adjacent block in the affine mode are less than a width and height of the block to be coded. If one or other of the conditions is not met, the set of candidate prediction modes for the block to be encoded does not include the related fusion mode. If both conditions are met, a bit stream is parsed to obtain third indication information. When the third indication information is 1, the set of candidate prediction modes includes the related fusion mode, or when the third indication information is 0 the set of candidate prediction modes does not include the related fusion mode; otherwise, the set of candidate prediction modes does not include the related fusion mode.
[0184] It should be understood that the determination condition (2) in this embodiment of the present invention means that the width of the adjacent block in the affine mode is less than the width of the block to be coded and the height of the adjacent block in the affine mode is less that the height of the block to be coded. In another modality, alternatively, the determination condition can be: the width of the adjacent block in the affine mode is less than the width of the block to be coded or the height of the adjacent block in the affine mode is less than the height of the block to be encoded. encoded, and no limitations are imposed on this.
[0185] In a fifth possible implementation, it is determined whether two conditions are met: (1) whether there is a block whose type of prediction is an affine mode among the adjacent reconstructed blocks; and (2) whether a width and height of the adjacent block in the affine mode are less than a width and height of the block to be coded. If one or other of the conditions is not met, the set of candidate prediction modes for the block to be encoded does not include the related fusion mode. If both conditions are met, the set of candidate prediction modes for the block to be coded includes the related fusion mode.
[0186] It should be understood that, in this embodiment of the present invention, prediction types and sizes of adjacent reconstructed blocks are used as a basis for determining the set of candidate prediction modes for the current block to be encoded, and attribute information of the adjacent reconstructed blocks that are obtained through parsing can also be used for determination, as long as the method corresponds to a coding side. This is not limited to this document.
[0187] It should be further understood that, in various possible implementations in this embodiment of the present invention, for purposes of illustration, such as the second possible implementation, the following determination criteria can be used to determine whether there is a block whose type of prediction is similar prediction among the adjacent reconstructed blocks. For purposes of illustration, when prediction types of at least two adjacent blocks are the affine mode, the set of candidate prediction modes for the block to be encoded includes the affine fusion mode; otherwise, the set of candidate prediction modes for the block to be encoded does not include the related fusion mode. Alternatively, a number of adjacent blocks whose type of prediction is the affine mode can also be at least three or at least four, as long as this corresponds to the coding side, and no limitations are imposed on this.
[0188] It should be further understood that, in various possible implementations in this embodiment of the present invention, for purposes of illustration, as in the fifth possible implementation, it is determined whether two conditions are met: (1) whether there is a block whose type of prediction it is an affine mode among the adjacent reconstructed blocks; and (2) whether a width and height of the adjacent block in the affine mode are less than a width and height of the block to be coded. The second determination condition, for illustration purposes, can also be if the width and height of the adjacent block in the affine mode are less than 1/2, 1/3 or 1/4 of the width and height of the block to be coded , as long as this corresponds to the coding side, and no limitations are imposed on this.
[0189] It should be further understood that, in this embodiment of the present invention, the indication information being set to 0 or 1 corresponds to the coding side.
[0190] S420. Parsing a bit stream to obtain first indication information.
[0191] The first indication information indicates indexing information of a prediction mode for a block to be decoded. This step corresponds to step S240 on the coding side.
[0192] S430. Determine, in the set of candidate prediction modes, a prediction mode for the image unit to be processed according to the first indication information.
[0193] Different sets of candidate prediction modes correspond to different lists of prediction modes. A list of prediction modes corresponding to the set of candidate prediction modes determined in S410 is searched according to the indexing information obtained in S420, so that the prediction mode for the block to be decoded can be discovered.
[0194] S440. Determine a predicted image from the image unit to be processed according to the prediction mode.
[0195] A specific method is similar to S230, and details are not described here again.
[0196] Figure 10 is a block diagram illustrating an example of another video decoder 60 for decoding video data, according to one or more technologies of the present invention.
[0197] Video decoder 60 includes: a first determination module 61, a parsing module 62, a second determination module 63 and a third determination module 64.
[0198] The first determination module 61 is configured to execute S410 to determine, according to information about imaging units adjacent to an imaging unit to be processed, whether a set of candidate prediction modes for the imaging unit to be processed includes a related fusion mode.
[0199] The parsing module 62 is configured to perform S420 parsing a bit stream to obtain first indication information.
[0200] The second determination module 63 is configured to execute S430 to determine, in the set of candidate prediction modes, a prediction mode for the image unit to be processed according to the first indication information.
[0201] The third determination module 64 is configured to execute S440 to determine a predicted image of the image unit to be processed according to the prediction mode.
[0202] Because movement information from adjacent blocks is correlated, there is a very high probability that a current block and an adjacent block have the same or similar prediction mode. In this embodiment of the present invention, prediction mode information of the current block is derived by determining information about adjacent blocks, reducing a bit rate of encoding a prediction mode, thereby improving encoding efficiency.
[0203] Figure 11 is a flow chart illustrating an example operation 500 of a video decoder for decoding video data, according to one or more technologies of the present invention. Figure 11 is provided as an example. In another example, the technologies of the present invention can be implemented by using more, less or different steps than those shown in the example in figure 11. According to the example method in figure 11, the video decoder 20 performs the following steps.
[0204] S510. Parsing a bit stream to obtain first indication information.
[0205] The first indication information indicates whether a set of candidate modes for a first image area to be processed includes a related motion model. This step corresponds to step S310 on the coding side.
[0206] S520. Determine a set of candidate modes for a first image area to be processed according to the first indication information.
[0207] When the first indication information is 0, a set of candidate translational modes is used as the set of candidate modes for the first image area to be processed, where the translational mode indicates a prediction mode in which a predicted image is obtained when using a translational model. When the first indication information is 1, a set of candidate translational modes and a set of candidate related modes are used as the set of candidate modes for the first image area to be processed, where the related mode indicates a prediction mode in that a predicted image is obtained when using a related model. The first image area to be processed can be any one of a group of image frames, an image frame, a set of image tiles, a set of image slices, an image tile, an image slice, a set of image encoding units or an image encoding unit. Correspondingly, the first indication information is encoded in a header of the group of image frames (s), for example, a set of video parameters (VPS), a set of sequence parameters (SPS), supplementary enrichment information ( SEI) or an image frame header, or, for example, an image parameter set (PPS), an image tile set header, an image slice set header, or an image tile header , or, for example, an image tile header (tile header), an image slice header (slice header), an image encoding set header, or an image encoding unit header.
[0208] It should be understood that the first image area to be processed in the stage can be pre-configured, or can be adaptively determined in an encoding process. A representation of a strip of the first image area to be processed can be known from an encoding / decoding side protocol, or a strip of the first image area to be processed can be received in the bit stream of one side coding, as long as this corresponds to the coding side, and no limitations are imposed on this.
[0209] It should be further understood that, in this embodiment of the present invention, the indication information is set to 0 or 1 is for illustration purposes, as long as this corresponds to the coding side.
[0210] S530. Parsing the bit stream to obtain second indication information.
[0211] The second indication information indicates a prediction mode for a block to be processed in the first image area to be processed. This step corresponds to step S340 on the coding side.
[0212] S540. Determine, in the set of candidate prediction modes for the first image area to be processed, a prediction mode for an image unit to be processed according to the second indication information.
[0213] A specific method is similar to S320, and details are not described here again.
[0214] S550. Determine a predicted image from the image unit to be processed according to the prediction mode.
[0215] A specific method is similar to S330, and details are not described here again.
[0216] Figure 12 is a block diagram illustrating an example of another video decoder 70 for decoding video data, according to one or more technologies of the present invention.
[0217] Video decoder 70 includes: a first parsing module 71, a first determination module 72, a second parsing module 73, a second determination module 74 and a third determination module 75.
[0218] The first parsing module 71 is configured to perform S510 to parse a bit stream to obtain first indication information.
[0219] The first determination module 72 is configured to execute S520 to determine a set of candidate modes for a first image area to be processed according to the first indication information.
[0220] The second parsing module 73 is configured to perform S530 parsing the bit stream to obtain second indication information.
[0221] The second determination module 74 is configured to execute S540 to determine, in the set of candidate prediction modes for the first image area to be processed, a prediction mode for an image unit to be processed according to second indication information.
[0222] The third determination module 75 is configured to execute S550 to determine a predicted image of the image unit to be processed according to the prediction mode.
[0223] Because the information of movement of adjacent blocks is correlated, there is a very high probability that there is only one translational movement and no related movement in the same area. In this embodiment of the present invention, a set of candidate prediction modes marking an area level is established, avoiding a bit rate of encoding a redundant mode, thereby improving encoding efficiency.
[0224] In one or more modalities, the functions described can be implemented by hardware, software, firmware or any combination thereof. If the functions are implemented by software, the functions can be stored on a computer-readable medium as one or more instructions or code, or sent on a computer-readable medium, and be executed by a hardware-based processing unit. Computer-readable media can include computer-readable storage media (which corresponds to tangible media such as data storage media) or communication media. The communication medium includes, for example, any medium that promotes data transmission, when using a computer program, from one place to another according to a communication protocol. In this mode, the computer-readable media in general can correspond to: (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium such as a signal or a carrier. A data storage medium can be any available medium that can be accessed by one or more computers or one or more processors to retrieve an instruction, code and / or a data structure to implement the technologies described in the present invention. A computer program product may include computer-readable media.
[0225] As an example, but not a limitation, some computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM, other optical disk storage or magnetic disk storage, another magnetic storage device, flash memory, or any other medium that can store required program code in the form of an instruction or a data structure and that can be accessed by a computer. Furthermore, any connection can be properly referred to as a computer-readable medium. For example, if an instruction is transmitted from a network site, server, or other remote source when using a coaxial cable, an optical cable, a twisted pair, a digital subscriber line (DSL), or wireless technology ( for example, infrared, radio, or microwave), coaxial cable, optical cable, twisted pair, DSL, or wireless technology (for example, infrared, radio, or microwave) is included in the definition of media. However, it should be understood that computer-readable storage media and data storage media may not include a connection, carrier, signal, or other transient media, but are not transient, tangible storage media. A disc and an optical disc used in this specification include a compact disc (CD), a laser disc, an optical disc, a versatile digital disc (DVD), a floppy disc and a Blue-ray disc, where the disc in a way general copies data in a magnetic mode, and the optical disc copies data when using a laser in an optical mode. A combination of the aforementioned objects should also be included in the scope of computer-readable media.
[0226] An instruction can be executed by one or more processors such as one or more digital signal processors (DSP), a general microprocessor, an application specific integrated circuit (ASIC), an array of field programmable ports (FPGA) , or another equivalent integrated or distinct logic circuit. Therefore, the term “processor” used in this specification can refer to the structure indicated above, or to any other structure that can be applied to implement the technologies described in this specification. In addition, in some respects, the features described in this specification can be provided in a dedicated hardware and / or software module configured for encoding and decoding, or can be incorporated into a combined encoder-decoder. Furthermore, the technologies can be fully implemented in one or more circuits or logic elements.
[0227] The technologies in the present invention can be widely implemented by multiple devices or devices. Handsets or devices include a radio phone, an integrated circuit (IC), or a set of ICs (for example, a chip set). In the present invention, various components, modules and units are described to emphasize functions of an apparatus that is configured to implement the disclosed technologies, but the functions do not necessarily need to be implemented by different hardware units. Precisely, as previously described, several units may be combined into one encoder-decoder hardware unit, or may be provided by a set of interoperable hardware units (including the one or more processors described above) in combination with software and / or appropriate firmware.
[0228] It should be understood that "a modality" mentioned in the entire specification describes that particular structures, characteristics or resources related to the modality are included in at least one modality of the present invention. Therefore, “in a modality” that appears throughout the specification does not necessarily refer to the same modality. Furthermore, these particular structures, characteristics or resources can be combined in one or more modalities using any appropriate mode.
[0229] It should be understood that, in the various modalities of the present invention, sequence numbers of the processes indicated above do not mean execution sequences, and should not be interpreted as any limitation in the processes of implementing the modalities of the present invention. The sequences of execution of the processes must be determined according to the functions and internal logic of the processes.
[0230] Furthermore, the terms "system" and "network" can be used interchangeably in this specification. The term “and / or” in this specification describes only one association relationship to describe associated objects and represents that three relationships can exist. For example, A and / or B can represent the following three cases: Only A exists, both A and B exist and only B exists. Furthermore, the character "/" in this specification generally indicates an "or" relationship between the associated objects.
[0231] It should be understood that in the modalities of this application “B corresponding to A” indicates that B is associated with A, and B can be determined according to A. However, it must be further understood that determining A according to B does not mean that B is determined only according to A; and B can also be determined according to A and / or other information.
[0232] A person of ordinary skill in the art may be aware that, in combination with the examples described in the modalities disclosed in this specification, algorithm units and steps can be implemented by electronic hardware, computer software or a combination thereof . In order to clearly describe the interchangeability between hardware and software, the foregoing described in a general way the compositions and steps of each example according to functions. Whether the functions are performed by hardware or software depends on particular applications and conditions of design restriction of technical solutions. Those skilled in the art may use different methods to implement the functions described for each particular application, but the implementation should not be considered to be beyond the scope of the present invention.
[0233] It can be clearly understood by those skilled in the art that, for the purpose of a convenient and brief description, for a detailed work process of the system, apparatus and unit previously exposed, reference can be made to a corresponding process in the exposed method modalities previously, and details are not described here again.
[0234] In the various modalities provided in this application, it should be understood that the revealed system, apparatus and method can be implemented in other modes. For example, the type of apparatus described is merely an example. For example, the unit division is merely a logical function division and can be another division in real implementation. For example, a plurality of units or components can be combined or integrated into another system, or some features can be ignored or not implemented. Furthermore, mutual couplings or direct couplings or communication connections displayed or discussed can be implemented using some interfaces. Indirect couplings or communication connections between devices or units can be implemented in electronic, mechanical or other ways.
[0235] The units described as separate parts may or may not be physically separated, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed in a plurality of network units. All or some of the units can be selected according to real requirements to achieve the objectives of the modalities solutions.
[0236] Furthermore, functional units in the modalities of the present invention can be integrated into a processing unit, or each of the units can exist physically alone, or two or more units are integrated into one unit.
[0237] The foregoing descriptions are merely specific implementations of the present invention, and are not intended to limit the scope of protection of the present invention. Any variation or substitution readily envisioned by those skilled in the art within the technical scope disclosed in the present invention must be included in the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be subject to the scope of protection of the claims.

权利要求:
Claims (4)
[0001]
1. Method for decoding a predicted image, CHARACTERIZED by the fact that it comprises: parsing (S510) a bit stream to obtain the first indication information; determining (S520) a set of candidate modes for a first image area to be processed according to the first indication information; where when the first indication information is 0, a set of candidate translational modes is used as the set of candidate modes for the first image area to be processed, where the translational mode indicates a prediction mode in which a predicted image is obtained when using a translational model; or, when the first indication information is 1, a set of candidate translational modes and a set of candidate related modes are used as the set of candidate modes for the first image area to be processed, where the related mode indicates a mode of prediction in which a predicted image is obtained when using a related model; parsing (S530) the bit stream to obtain second indication information; determine (S540), in a set of candidate prediction modes for the first image area to be processed, a prediction mode for an image unit to be processed according to the second indication information, where the image unit to be processed belongs to the first image area to be processed; and determining (S550) a predicted image of the image unit to be processed according to the prediction mode.
[0002]
2. Method, according to claim 1, CHARACTERIZED by the fact that the first image area to be processed comprises one of a group of image frames, an image frame, a set of image tiles, a set of slices image, an image tile, an image slice, a set of image encoding units or an image encoding unit.
[0003]
3. Apparatus (70) for decoding a predicted image, CHARACTERIZED by the fact that it comprises: a first parsing module (71), configured to parse a bit stream to obtain first indication information; a first determination module (72), configured to determine a set of candidate modes for a first image area to be processed according to the first indication information; where when the first indication information is 0, a set of candidate translational modes is used as the set of candidate modes for the first image area to be processed, where the translational mode indicates a prediction mode in which a predicted image is obtained when using a translational model; or when the first indication information is 1, a set of candidate translational modes and a set of candidate related modes are used as the set of candidate modes for the first image area to be processed, where the related mode indicates a mode of prediction in which a predicted image is obtained when using a related model; a second parsing module (73), configured to parse the bit stream to obtain second indication information; a second determination module (74), configured to determine, in a set of candidate prediction modes for the first image area to be processed, a prediction mode for an image unit to be processed according to the second information from indication, in which the image unit to be processed belongs to the first image area to be processed; and a third determination module (75), configured to determine a predicted image of the image unit to be processed according to the prediction mode.
[0004]
4. Apparatus according to claim 3, CHARACTERIZED by the fact that the first image area to be processed comprises one of a group of image frames, an image frame, a set of image tiles, a set of slices image, an image tile, an image slice, a set of image encoding units or an image encoding unit.

类似技术:

公开号 | 公开日 | 专利标题

BR112018006271B1|2021-01-26|method and apparatus for decoding a predicted image

TWI686076B|2020-02-21|Enhanced multiple transforms for prediction residual

US9491469B2|2016-11-08|Coding of last significant transform coefficient

RU2584498C2|2016-05-20|Intra-mode video encoding

JP5770311B2|2015-08-26|Subslice in video coding

BR122020003135B1|2021-07-06|METHOD AND DEVICE FOR DECODING VIDEO DATA AND COMPUTER-READABLE NON- TRANSIENT STORAGE MEDIA

JP6022586B2|2016-11-09|Detecting the availability of adjacent video units for video coding

JP6158422B2|2017-07-05|Cross-layer POC alignment of multi-layer bitstreams that may include unaligned IRAP pictures

TW201711464A|2017-03-16|Coding data using an enhanced context-adaptive binary arithmetic coding | design

BR112021005357A2|2021-06-15|improvements to history-based motion vector predictor

JP2018524906A|2018-08-30|Reference picture list configuration in intra block copy mode

BR112020006572A2|2020-10-06|binary arithmetic coding with progressive modification of adaptation parameters

JP2017525316A|2017-08-31|Method for palette mode coding

US9854253B2|2017-12-26|Method for motion vector difference | and intra block copy vector difference | coding of screen content video data

JP2021535672A|2021-12-16|Video image prediction method and equipment

RU2575412C2|2016-02-20|Border pixel padding with fill characters for intra prediction in video coding

同族专利:

公开号 | 公开日

JP6882560B2|2021-06-02|

CN108965871A|2018-12-07|

JP2020092457A|2020-06-11|

KR20180043830A|2018-04-30|

SG11201801863QA|2018-04-27|

US20180205965A1|2018-07-19|

AU2016333221B2|2019-10-03|

ZA201801541B|2019-08-28|

AU2016333221A1|2018-03-29|

EP3331243A4|2018-07-18|

MX2018003764A|2018-07-06|

JP6669859B2|2020-03-18|

CN106559669A|2017-04-05|

CN106559669B|2018-10-09|

KR102114764B1|2020-05-25|

CN109274974B|2022-02-11|

KR20200057120A|2020-05-25|

EP3331243B1|2021-04-21|

RU2019124468A|2019-09-03|

BR112018006271A2|2018-10-16|

JP2021119709A|2021-08-12|

KR102240141B1|2021-04-13|

WO2017054630A1|2017-04-06|

CN109274974A|2019-01-25|

RU2697726C1|2019-08-19|

JP2018533261A|2018-11-08|

EP3331243A1|2018-06-06|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

JP3681342B2|2000-05-24|2005-08-10|三星電子株式会社|Video coding method|

JP4245587B2|2005-06-22|2009-03-25|シャープ株式会社|Motion compensation prediction method|

WO2011013253A1|2009-07-31|2011-02-03|株式会社東芝|Prediction-signal producing device using geometric transformation motion-compensation prediction, time-varying image encoding device, and time-varying image decoding device|

US8179446B2|2010-01-18|2012-05-15|Texas Instruments Incorporated|Video stabilization and reduction of rolling shutter distortion|

CN105791859B|2010-05-26|2018-11-06|Lg电子株式会社|Method and apparatus for handling vision signal|

CN102377992B|2010-08-06|2014-06-04|华为技术有限公司|Method and device for obtaining predicted value of motion vector|

CN105847830B|2010-11-23|2019-07-12|Lg电子株式会社|Prediction technique between being executed by encoding apparatus and decoding apparatus|

US9319716B2|2011-01-27|2016-04-19|Qualcomm Incorporated|Performing motion vector prediction for video coding|

US9131239B2|2011-06-20|2015-09-08|Qualcomm Incorporated|Unified merge mode and adaptive motion vector prediction mode candidates selection|

BR112014011425A2|2011-11-11|2017-05-16|Faunhofer-Gesellschaft Zur Förderung Der Angewandten Forschung E V|efficient multi-view coding using depth map estimation and updating|

EP2683165B1|2012-07-04|2015-10-14|Thomson Licensing|Method for coding and decoding a block of pixels from a motion model|

CN112087630A|2014-09-30|2020-12-15|华为技术有限公司|Image prediction method and related device|

CN104363451B|2014-10-27|2019-01-25|华为技术有限公司|Image prediction method and relevant apparatus|

CN107809642B|2015-02-16|2020-06-16|华为技术有限公司|Method for encoding and decoding video image, encoding device and decoding device|KR20190092462A|2016-12-27|2019-08-07|삼성전자주식회사|Encoding method and apparatus, decoding method and apparatus|

US10630994B2|2017-06-28|2020-04-21|Agora Lab, Inc.|Specific operation prediction in video compression|

US10979718B2|2017-09-01|2021-04-13|Apple Inc.|Machine learning video processing systems and methods|

US10609384B2|2017-09-21|2020-03-31|Futurewei Technologies, Inc.|Restriction on sub-block size derivation for affine inter prediction|

US20190222834A1|2018-01-18|2019-07-18|Mediatek Inc.|Variable affine merge candidates for video coding|

WO2019234600A1|2018-06-05|2019-12-12|Beijing Bytedance Network Technology Co., Ltd.|Interaction between pairwise average merging candidates and intra-block copy |

GB2589223A|2018-06-21|2021-05-26|Beijing Bytedance Network Tech Co Ltd|Component-dependent sub-block dividing|

EP3804327A1|2018-07-01|2021-04-14|Beijing Bytedance Network Technology Co. Ltd.|Efficient affine merge motion vector derivation|

CN110677645A|2018-07-02|2020-01-10|华为技术有限公司|Image prediction method and device|

US10516885B1|2018-07-11|2019-12-24|Tencent America LLC|Method and apparatus for video coding|

CN110868587A|2018-08-27|2020-03-06|华为技术有限公司|Video image prediction method and device|

EP3837835A1|2018-09-18|2021-06-23|Huawei Technologies Co., Ltd.|Coding method, device, system|

KR20210057166A|2018-09-18|2021-05-20|후아웨이 테크놀러지 컴퍼니 리미티드|Video encoder, video decoder and method|

GB2577318B|2018-09-21|2021-03-10|Canon Kk|Video coding and decoding|

GB2597616A|2018-09-21|2022-02-02|Canon Kk|Video coding and decoding|

WO2020065518A1|2018-09-24|2020-04-02|Beijing Bytedance Network Technology Co., Ltd.|Bi-prediction with weights in video coding and decoding|

CN113170192A|2018-11-15|2021-07-23|北京字节跳动网络技术有限公司|Affine MERGE and MVD|

WO2020098810A1|2018-11-17|2020-05-22|Beijing Bytedance Network Technology Co., Ltd.|Merge with motion vector difference in video processing|

EP3930327A1|2019-03-08|2021-12-29|Guangdong Oppo Mobile Telecommunications Corp., Ltd.|Prediction method, encoder, decoder, and computer storage medium|

CN113508596A|2019-03-11|2021-10-15|Oppo广东移动通信有限公司|Intra-frame prediction method, device and computer storage medium|

WO2020186882A1|2019-03-18|2020-09-24|华为技术有限公司|Triangle prediction unit mode-based processing method and device|

CN113411590B|2019-06-21|2022-03-08|杭州海康威视数字技术股份有限公司|Method and device for decoding and encoding prediction mode|

US11132780B2|2020-02-14|2021-09-28|Huawei Technologies Co., Ltd.|Target detection method, training method, electronic device, and computer-readable medium|

法律状态:
2020-05-26| B15K| Others concerning applications: alteration of classification|Free format text: A CLASSIFICACAO ANTERIOR ERA: H04N 19/105 Ipc: H04N 19/159 (2014.01), H04N 19/70 (2014.01), H04N |

2020-06-02| B07A| Application suspended after technical examination (opinion) [chapter 7.1 patent gazette]|

2020-11-24| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

2021-01-26| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 08/09/2016, OBSERVADAS AS CONDICOES LEGAIS. |

优先权:

申请号 | 申请日 | 专利标题

CN201510632589.1A|CN106559669B|2015-09-29|2015-09-29|Prognostic chart picture decoding method and device|

CN201510632589.1|2015-09-29|

PCT/CN2016/098464|WO2017054630A1|2015-09-29|2016-09-08|Image prediction method and device|

[返回顶部]